Dynamodb query/scan using python boto3

Dynamodb query/scan using python boto3 - python

my dynamodb table has timestamp(in YYYY-MM-DD HH:MN:SS) as PrimaryKey column and temperature as sortkey while in data {"humidity" : 42 ,"location":"room" , "temperature":,"thermostat":}
in boto3 python i need to scan based on timestamp (now and 15min ago) with condition if difference(temperature - thermostat) > 5 for more than 10 times then return thermostat-5 and if (temperature - thermostat) < 5 for more than 10 times then returns thermostat+5... following is the code
import boto3
import math
import json
import time
import dateutil.tz
from datetime import datetime,timedelta
from dateutil import tz
from dateutil.tz import tzlocal
from boto3.dynamodb.conditions import Key, Attr
client = boto3.client('dynamodb')
dynamodb = boto3.resource('dynamodb')
def lambda_handler(event, context):
#table_name= "thermostat_dynamo"
table_name= "TableDynamo"
Primary_Column_Name = 'timestamp'
table = dynamodb.Table(table_name)
#key_param = "thermostat"
#thermostatVal = table.get_item(Key={key_param:event[key_param]}) ## get record from dynamodb for this sensor
thermostatVal= 77
south = dateutil.tz.gettz('Asia/Kolkata')
now = datetime.now(tz=south)
fifteen_min_ago = now - timedelta(seconds=900)
now = now.strftime('%F %T')
fifteen_min_ago = fifteen_min_ago.strftime('%F %T')
fe = Key('timeStamp').between(fifteen_min_ago,now);
response = table.scan(FilterExpression=fe & Attr('temperature').lt(thermostatVal))
if response['Count'] == 10:
#return thermostatVal+5
thermonew = thermostatVal + 5
tosensor = '{"thermostat":'+ str(thermonew) + '}'
print(tosensor)
#response = client.publish(topic="updatehomesensor", qos=1, payload=tosensor)
return
elif response['Count'] < 10:
print('{"thermostat":' + str(thermostatVal) + '}')
return

If timestamp was a sort key, you could have used a Query request to scan through all the items with timestamp > now-15min.
However, unfortunately, timestamp is your hash key. The only way you can find the items with timestamp > now-15min is to Scan through all your items. This will cost you a lot of money: You pay Amazon for each item scanned, not each item returned after the filtering.
Another problem is that the DynamoDB filtering syntax (look at the FilterExpression documentation) doesn't actually allow to do addition and subtractions as part of the test. If you always want to do "temperature - thermostat", you can use that as one of the attributes (so you can do a FilterExpression on it), and the second attribute would be "thermostat", and later you can add the two up to get the "temperature".

Related

how to use var from calculations in pandas_gbq to query?

Hi I am trying to do a query with pandas_gbq however I can't call on the variable. How do I do so? I am using the pandas_gbq library
def dataBQ(maxDate):
#load data
dat = pd.read_csv("data/source/rawdat.csv", delimiter=";")
#convert to datetime format
dat['date_key']=pd.to_datetime(dat['date_key'],format='%d/%m/%Y').dt.date
#get latest date
maxDate = dat['date_key'].max()
dataTraffic = """
SELECT *
from
`fileData` where
date_key > {maxDate}
"""
dataBQ = pandas_gbq.read_gbq(dataBQ , project_id=projectId)
how do I do a reference maxdate in the query of dataTraffic?

Google Cloud Firestore error: order by clause cannot contain duplicate fields

Seeking guidance on the following error:
google.api_core.exceptions.InvalidArgument: 400 order by clause cannot contain duplicate fields end_date
I am trying to create an endpoint that searches for documents that have an end date between two dates and allows pagination (i.e starting from a particular document).
From the error it seems we cannot use the same field twice (i.e to search between two dates in my case) when also starting from a particular document. Although you can search between two dates without issue when not needing to paginate.
The below code reliably reproduces the error:
from datetime import datetime, timezone
from dateutil.relativedelta import relativedelta
from google.cloud import firestore
one_week_ago = datetime.now(timezone.utc) - relativedelta(weeks=1)
one_weeks_time = datetime.now(timezone.utc) + relativedelta(weeks=1)
collection_name = '...'
starting_doc_id = 'FnAFSazMlXwYWfnEzS1x' # used to support pagination
client = firestore.Client()
collection_ref = client.collection(collection_name)
start_at_snapshot = collection_ref.document(starting_doc_id).get()
collection_ref = collection_ref.start_at(start_at_snapshot)
collection_ref = collection_ref.where('end_date', '>=', one_week_ago)].where('end_date', '<=', one_weeks_time)
for item in collection_ref.stream():
print(item.id)

To get it to work, I had to set order_by by including .order_by('end_date').
The documentation was not clear about this, so I hope this helps someone in the future.
Final working code is:
from datetime import datetime, timezone
from dateutil.relativedelta import relativedelta
from google.cloud import firestore
one_week_ago = datetime.now(timezone.utc) - relativedelta(weeks=1)
one_weeks_time = datetime.now(timezone.utc) + relativedelta(weeks=1)
collection_name = '...'
starting_doc_id = 'FnAFSazMlXwYWfnEzS1x' # used to support pagination
client = firestore.Client()
collection_ref = client.collection(collection_name)
start_at_snapshot = collection_ref.document(starting_doc_id).get()
collection_ref = collection_ref.start_at(start_at_snapshot)
# note only this line changed
collection_ref = collection_ref.order_by('end_date').where('end_date', '>=', one_week_ago)].where('end_date', '<=', one_weeks_time)
for item in collection_ref.stream():
print(item.id)

Get AWS EC2 specific tag/value combo + instance id

I'm new to Python and programing. I need to create a Lambda function using Python 3.7 that will look for a specific tag/value combo and return the tag value along with the instance id . I can get both with my current code but I'm having a hard time figuring out how to combine these. boto3.resource gives me the tag value and boto3.client gives me the instance id.
I have EC2 instances (1000's) where we need to monitor the tag value for the tag 'expenddate' and compare the value (mm/dd/yy) to the current date (mm/dd/yy) and alert when 'expenddate' value is less than the current date.
import boto3
import collections
import datetime
import time
import sys
from datetime import date as dt
def lambda_handler(event, context):
today = datetime.date.today()
mdy = today_string = today.strftime('%m/%d/%y')
ec2 = boto3.resource('ec2')
for instance in ec2.instances.all():
if instance.tags is None:
continue
for tag in instance.tags:
if tag['Key'] == 'expenddate':
if (tag['Value']) <= mdy:
print ("Tag has expired!!!!!!!!!!!")
else:
print ("goodby")
client = boto3.client('ec2')
resp = client.describe_instances(Filters=[{
'Name': 'tag:expenddate',
'Values': ['*']
}])
for reservation in resp['Reservations']:
for instance in reservation['Instances']:
print("InstanceId is {} ".format(instance['InstanceId']))
I want to end up with a combined instance id and tag value or two variables that I can combine later.

change
print ("Tag has expired!!!!!!!!!!!")
to
# initialise array
expiredInstances=[]
.
.
.
.
.
print ("%s has expired" % instance.id)
expiredInstances.append({'instanceId':instance.id,'tag-value':tag['Value']})
That will give you an array of instanceId's with tag values

list RDS snapshot created today using Boto 3

I am doing a Python Lambda function to describe list of RDS snapshots created today. The challenge is how to convert the datetime.datetime.today() into a format which RDS client understands?
UPDATE: I have implemented some changes suggested, I have added a string variable to convert the date expression into format which Boto3 RDS understands.
'SnapshotCreateTime': datetime(2015, 1, 1),
today = (datetime.today()).date()
rds_client = boto3.client('rds')
snapshots = rds_client.describe_db_snapshots(SnapshotType='automated')
harini = "datetime("+ today.strftime('%Y,%m,%d') + ")"
print harini
print snapshots
for i in snapshots['DBSnapshots']:
if i['SnapshotCreateTime'].date() == harini:
print(i['DBSnapshotIdentifier'])
print (today)
it is still unable to retrieve list of automated snapshots created today

SnapshotCreateTime is a datetime.datetime object. So, you can just do i['SnapshotCreateTime'].date() to get the date.
import boto3
from datetime import datetime, timezone
today = (datetime.today()).date()
rds_client = boto3.client('rds')
snapshots = rds_client.describe_db_snapshots()
for i in snapshots['DBSnapshots']:
if i['SnapshotCreateTime'].date() == today:
print(i['DBSnapshotIdentifier'])
print (today)

Limit calls to external database with Python CGI

I've got a Python CGI script that pulls data from a GPS service; I'd like this information to be updated on the webpage about once every 10s (the max allowed by the GPS service's TOS). But there could be, say, 100 users viewing the webpage at once, all calling the script.
I think the users' scripts need to grab data from a buffer page that itself only upates once every ten seconds. How can I make this buffer page auto-update if there's no one directly viewing the content (and not accessing the CGI)? Are there better ways to accomplish this?

Cache the results of your GPS data query in a file or database (sqlite) along with a datetime.
You can then do a datetime check against the last cached datetime to initiate another GPS data query.
You'll probably run into concurrency issues with cgi and the datetime check though...
To get around concurrency issues, you can use sqlite, and put the write in a try/except.
Here's a sample cache implementation using sqlite.
import datetime
import sqlite3
class GpsCache(object):
db_path = 'gps_cache.db'
def __init__(self):
self.con = sqlite3.connect(self.db_path)
self.cur = self.con.cursor()
def _get_period(self, dt=None):
'''normalize time to 15 minute periods'''
if dt.minute < 15:
minute_period = 0
elif 15 <= dt.minute < 30:
minute_period = 15
elif 30 <= dt_minute < 45:
minute_period = 30
elif 45 <= dt_minute:
minute_period = 25
period_dt = datetime.datetime(year=dt.year, month=dt.month, day=dt.day, hour=dt.hour, minute=minute_period)
return period_dt
def get_cache(dt=None):
period_dt = self._get_period(dt)
select_sql = 'SELECT * FROM GPS_CACHE WHERE date_time = "%s";' % period_dt.strftime('%Y-%m-%d %H:%M')
self.cur.execut(select_sql)
result = self.cur.fetchone()[0]
return result
def put_cache(dt=None, data=None):
period_dt = self._get_period(dt)
insert_sql = 'INSERT ....' # edit to your table structure
try:
self.cur.execute(insert_sql)
self.con.commit()
except sqlite3.OperationalError:
# assume db is being updated by another process with the current resutls and ignore
pass
So we have the cache tool now the implementation side.
You'll want to check the cache first then if it's not 'fresh' (doens't return anything), go grab the data using your current method. Then cache the data you grabbed.
you should probably organize this better, but you should get the general idea here.
Using this sample, you just replace your current calls to 'remote_get_gps_data' with 'get_gps_data'.
from gps_cacher import GpsCache
def remote_get_gps_data():
# your function here
return data
def get_gps_data():
data = None
gps_cache = GpsCache()
current_dt = datetime.datetime.now()
cached_data = gps_cache.get_cache(current_dt)
if cached_data:
data = cached_data
else:
data = remote_get_gps_data()
gps_cache.put_cache(current_dt, data)
return data

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Dynamodb query/scan using python boto3 - python

Related

how to use var from calculations in pandas_gbq to query?

Google Cloud Firestore error: order by clause cannot contain duplicate fields

Get AWS EC2 specific tag/value combo + instance id

list RDS snapshot created today using Boto 3

Limit calls to external database with Python CGI

Categories

Resources