Create Lambda - DynamoDB Count Function

Create Lambda - DynamoDB Count Function - python

I am creating a SAM web app, with the backend being an API in front of a Python Lambda function with a DynamoDB table that maintains a count of the number of HTTP calls to the API. The API must also return this number. The yaml code itself loads normally. My problem is writing the Lambda function to iterate and return the count. Here is my code:
def lambda_handler(event, context):
dynamodb = boto3.resource("dynamodb")
ddbTableName = os.environ["databaseName"]
table = dynamodb.Table(ddbTableName)
# Update item in table or add if doesn't exist
ddbResponse = table.update_item(
Key={"id": "VisitorCount"},
UpdateExpression="SET count = count + :value",
ExpressionAttributeValues={":value": Decimal(context)},
ReturnValues="UPDATED_NEW",
)
# Format dynamodb response into variable
responseBody = json.dumps({"VisitorCount": ddbResponse["Attributes"]["count"]})
# Create api response object
apiResponse = {"isBase64Encoded": False, "statusCode": 200, "body": responseBody}
# Return api response object
return apiResponse
I can get VisitorCount to be a string, but not a number. I get this error: [ERROR] TypeError: lambda_handler() missing 1 required positional argument: 'cou    response = request_handler(event, lambda_context)le_event_request
What is going on?
[UPDATE] I found the original error, which was that the function was not properly received by the SAM app. Changing the name fixed this, and it is now being read. Now I have to troubleshoot the actual Python. New Code:
import json
import boto3
import os
dynamodb = boto3.resource("dynamodb")
ddbTableName = os.environ["databaseName"]
table = dynamodb.Table(ddbTableName)
Key = {"VisitorCount": { "N" : "0" }}
def handler(event, context):
# Update item in table or add if doesn't exist
ddbResponse = table.update_item(
UpdateExpression= "set VisitorCount = VisitorCount + :val",
ExpressionAttributeValues={":val": {"N":"1"}},
ReturnValues="UPDATED_NEW",
)
# Format dynamodb response into variable
responseBody = json.dumps({"VisitorCount": ddbResponse["Attributes"]["count"]})
# Create api response object
apiResponse = {"isBase64Encoded": False, "statusCode": 200,"body": responseBody}
# Return api response object
return apiResponse
I am getting a syntax error on Line 13, which is
UpdateExpression= "set VisitorCount = VisitorCount + :val",
But I can't tell where I am going wrong on this. It should update the DynamoDB table to increase the count by 1. Looking at the AWS guide it appears to be the correct syntax.

Not sure what the exact error is but ddbResponse will be like this:
ddbResponse = table.update_item(
Key={
'key1': aaa,
'key2': bbb
},
UpdateExpression= "set VisitorCount = VisitorCount + :val",
ExpressionAttributeValues={":val": Decimal(1)},
ReturnValues="UPDATED_NEW",
)
Specify item to be updated with Key (one item for one Lambda call)
Set Decimal(1) for ExpressionAttributeValues
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GettingStarted.Python.03.html#GettingStarted.Python.03.04

Related

How can I bulk upload JSON records to AWS OpenSearch index using a python client library?

I have a sufficiently large dataset that I would like to bulk index the JSON objects in AWS OpenSearch.
I cannot see how to achieve this using any of: boto3, awswrangler, opensearch-py, elasticsearch, elasticsearch-py.
Is there a way to do this without using a python request (PUT/POST) directly?
Note that this is not for: ElasticSearch, AWS ElasticSearch.
Many thanks!

I finally found a way to do it using opensearch-py, as follows.
First establish the client,
# First fetch credentials from environment defaults
# If you can get this far you probably know how to tailor them
# For your particular situation. Otherwise SO is a safe bet :)
import boto3
credentials = boto3.Session().get_credentials()
region='eu-west-2' # for example
auth = AWSV4SignerAuth(credentials, region)
# Now set up the AWS 'Signer'
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
auth = AWSV4SignerAuth(credentials, region)
# And finally the OpenSearch client
host=f"...{region}.es.amazonaws.com" # fill in your hostname (minus the https://) here
client = OpenSearch(
hosts = [{'host': host, 'port': 443}],
http_auth = auth,
use_ssl = True,
verify_certs = True,
connection_class = RequestsHttpConnection
)
Phew! Let's create the data now:
# Spot the deliberate mistake(s) :D
document1 = {
"title": "Moneyball",
"director": "Bennett Miller",
"year": "2011"
}
document2 = {
"title": "Apollo 13",
"director": "Richie Cunningham",
"year": "1994"
}
data = [document1, document2]
TIP! Create the index if you need to -
my_index = 'my_index'
try:
response = client.indices.create(my_index)
print('\nCreating index:')
print(response)
except Exception as e:
# If, for example, my_index already exists, do not much!
print(e)
This is where things go a bit nutty. I hadn't realised that every single bulk action needs an, er, action e.g. "index", "search" etc. - so let's define that now
action={
"index": {
"_index": my_index
}
}
You can read all about the bulk REST API, there.
The next quirk is that the OpenSearch bulk API requires Newline Delimited JSON (see https://www.ndjson.org), which is basically JSON serialized as strings and separated by newlines. Someone wrote on SO that this "bizarre" API looked like one designed by a data scientist - far from taking offence, I think that rocks. (I agree ndjson is weird though.)
Hideously, now let's build up the full JSON string, combining the data and actions. A helper fn is at hand!
def payload_constructor(data,action):
# "All my own work"
action_string = json.dumps(action) + "\n"
payload_string=""
for datum in data:
payload_string += action_string
this_line = json.dumps(datum) + "\n"
payload_string += this_line
return payload_string
OK so now we can finally invoke the bulk API. I suppose you could mix in all sorts of actions (out of scope here) - go for it!
response=client.bulk(body=payload_constructor(data,action),index=my_index)
That's probably the most boring punchline ever but there you have it.
You can also just get (geddit) .bulk() to just use index= and set the action to:
action={"index": {}}
Hey presto!
Now, choose your poison - the other solution looks crazily shorter and neater.
PS The well-hidden opensearch-py documentation on this are located here.

conn = wr.opensearch.connect(
host=self.hosts, # URL
port=443,
username=self.username,
password=self.password
)
def insert_index_data(data, index_name='stocks', delete_index_data=False):
""" Bulk Create
args: body [{doc1}{doc2}....]
"""
if delete_index_data:
index_name = 'symbol'
self.delete_es_index(index_name)
resp = wr.opensearch.index_documents(
self.conn,
documents=data,
index=index_name
)
print(resp)
return resp

I have used below code to bulk insert records from postgres into OpenSearch ( ES 7.2 )
import sqlalchemy as sa
from sqlalchemy import text
import pandas as pd
import numpy as np
from opensearchpy import OpenSearch
from opensearchpy.helpers import bulk
import json
engine = sa.create_engine('postgresql+psycopg2://postgres:postgres#127.0.0.1:5432/postgres')
host = 'search-xxxxxxxxxx.us-east-1.es.amazonaws.com'
port = 443
auth = ('username', 'password') # For testing only. Don't store credentials in code.
# Create the client with SSL/TLS enabled, but hostname verification disabled.
client = OpenSearch(
hosts = [{'host': host, 'port': port}],
http_compress = True,
http_auth = auth,
use_ssl = True,
verify_certs = True,
ssl_assert_hostname = False,
ssl_show_warn = False
)
with engine.connect() as connection:
result = connection.execute(text("select * from account_1_study_1.stg_pred where domain='LB'"))
records = []
for row in result:
record = dict(row)
record.update(record['item_dataset'])
del record['item_dataset']
records.append(record)
df = pd.DataFrame(records)
#df['Date'] = df['Date'].astype(str)
df = df.fillna("null")
print(df.keys)
documents = df.to_dict(orient='records')
#bulk(es ,documents, index='search-irl-poc-dump', raise_on_error=True)\
#response=client.bulk(body=documents,index='sample-index')
bulk(client, documents, index='search-irl-poc-dump', raise_on_error=True, refresh=True)

Kinesis Firehose Lambda Transformation and Dynamic partition

The following data presented is from the faker library. i am trying to learn and implement
dynamic partition in kinesis Firehose
Sample payload Input
{
"name":"Dr. Nancy Mcmillan",
"phone_numbers":"8XXXXX",
"city":"Priscillaport",
"address":"908 Mitchell Views SXXXXXXXX 42564",
"date":"1980-07-11",
"customer_id":"3"
}
Sample Input code
def main():
import boto3
import json
AWS_ACCESS_KEY = "XXXXX"
AWS_SECRET_KEY = "XXX"
AWS_REGION_NAME = "us-east-1"
for i in range(1,13):
faker = Faker()
json_data = {
"name": faker.name(),
"phone_numbers": faker.phone_number(),
"city": faker.city(),
"address": faker.address(),
"date": str(faker.date()),
"customer_id": str(random.randint(1, 5))
}
print(json_data)
hasher = MyHasher(key=json_data)
res = hasher.get()
client = boto3.client(
"kinesis",
aws_access_key_id=AWS_ACCESS_KEY,
aws_secret_access_key=AWS_SECRET_KEY,
region_name=AWS_REGION_NAME,
)
response = client.put_record(
StreamName='XXX',
Data=json.dumps(json_data),
PartitionKey='test',
)
print(response)
Here is lambda code which work fine
try:
import json
import boto3
import base64
from dateutil import parser
except Exception as e:
pass
class MyHasher(object):
def __init__(self, key):
self.key = key
def get(self):
keys = str(self.key).encode("UTF-8")
keys = base64.b64encode(keys)
keys = keys.decode("UTF-8")
return keys
def lambda_handler(event, context):
print("Event")
print(event)
output = []
for record in event["records"]:
dat = base64.b64decode(record["data"])
serialize_payload = json.loads(dat)
print("serialize_payload", serialize_payload)
json_new_line = str(serialize_payload) + "\n"
hasherHelper = MyHasher(key=json_new_line)
hash = hasherHelper.get()
partition_keys = {"customer_id": serialize_payload.get("customer_id")}
_ = {
"recordId": record["recordId"],
"result": "Ok",
"data": hash,
'metadata': {
'partitionKeys':
partition_keys
}
}
print(_)
output.append(_)
print("*****************")
print(output)
return {"records": output}
Sample screenshots show works fine
Here are setting on firehose for dynamic partition
some reason on AWS S3 I see an error folder and all my messages go into that
I have successfully implemented lambda transformation and have made a video which can be found below I am currently stuck on the dynamic partition I have tried reading several posts but that didn't help
https://www.youtube.com/watch?v=6wot9Z93vAY&t=231s
Thank you again and looking forward to hearing from you guys
Refernecs
https://docs.aws.amazon.com/firehose/latest/dev/dynamic-partitioning.html
https://www.youtube.com/watch?v=HcOVAFn-KhM
https://www.youtube.com/watch?v=PoaKgHdJgCE
https://medium.com/#bv_subhash/kinesis-firehose-performs-partitioning-based-on-timestamps-and-creates-files-in-s3-but-they-would-13efd51f6d39
https://www.amazonaws.cn/en/new/2021/s3-analytics-dynamic-partitioning-kinesis-data-firehose/

There are two prefix options for dynamic partitioning. 1) partitionKeyFromQuery 2) partitionKeyFromLambda. If you want firehose to parse record and get partition key then use first option. If you want to provide partition key after performing transformation use second option.
As per your firehose config, you are using lambda to provide partition key (second option) but prefix is provided for first option. To resolve this issue either disable inline parsing and add second option to firehose prefix !{partitionKeyFromLambda:customer_id}/ or remove lambda transformation and keep inline parsing

Get the UserID for an IAM user programatically using Boto3

I am trying to get the UserId after the creation of user, suing "account_id = boto3.client('sts').get_caller_identity().get('some_other_user')", but for output I get none, what could be the reason. I am very new to boto and python so it might be something very small.
import boto3
import sys
import json
iam = boto3.resource('iam') #resource representing IAM
group = iam.Group('group1') # Name of group
created_user = iam.create_user(
UserName='some_other_user'
)
account_id = boto3.client('sts').get_caller_identity().get('some_other_user')
print(account_id)
create_group_response = iam.create_group(GroupName = 'group1')
response = group.add_user(
UserName='some_other_user' #name of user
)
group = iam.Group('group1')
response = group.attach_policy(
PolicyArn='arn:aws:iam::196687784:policy/boto-test'
)

The get_caller_identity() function returns a dict containing:
{
'UserId': 'AIDAEXAMPLEHERE',
'Account': '123456789012',
'Arn': 'arn:aws:iam::123456789012:user/james'
}
So, to use this:
import boto3
sts = boto3.client('sts')
response = sts.get_caller_identity()
print('User ID:', response['UserId'])
Or you can use response.get('UserId') to get the user ID. The key to the user ID in the response dictionary is always the literal UserId. It doesn't vary (you cannot call response.get('james'), for example).
You cannot retrieve the identity of an arbitrary IAM principal using sts.get_caller_identity(). It only gives you the identity associated with the credentials that you implicitly used when making the call.

KeyError: records when trying to return record count from API

I am trying to check how many records a player has using the Hypixel API friends endpoint (api.hypixel.net/friends)
It keeps giving me a key error when trying to count the records. Here is what the API gives me:
{
"success": true,
"records": [{"_id":"5806841c0cf247f13be18b9d","uuidSender":"71ef88df5f7e482fb472f344965beba8","uuidReceiver":"976129e438b54a839944b1c0703d4da3","started":1476822044856},{"_id":"59589dde0cf250df95af825e","uuidSender":"71ef88df5f7e482fb472f344965beba8","uuidReceiver":"2c5bfce120c04ef1bfb3b798fe0d650e","started":1498979806692},{"_id":"5a444d820cf2604c12e0f6bd","uuidSender":"71ef88df5f7e482fb472f344965beba8","uuidReceiver":"ea703151981a409f8d0ff7cb782ab1c1","started":1514425730370},{"_id":"5aa595800cf24bd1104381cc","uuidSender":"71ef88df5f7e482fb472f344965beba8","uuidReceiver":"27af346e5bde40f0a665e808a331576f","started":1520801152354},{"_id":"5e2511f90cf289be2d0ea273","uuidSender":"71ef88df5f7e482fb472f344965beba8","uuidReceiver":"24a90aca074c4293a656f5fda047f816","started":1579487737061},{"_id":"5e36448c0cf2174287f94af7","uuidSender":"71ef88df5f7e482fb472f344965beba8","uuidReceiver":"1nmHNU1mERJEG452cx1FazPx3RpAuZ9vW","started":1580614796544},{"_id":"5e3f54190cf2ab010c5a21ce","uuidSender":"71ef88df5f7e482fb472f344965beba8","uuidReceiver":"63817f05823945a2b58f4ba1de5589a3","started":1581208601957},{"_id":"55bec5e0c8f2e017bca39176","uuidSender":"1nmHNU1mERJEG452cx1FazPx3RpAuZ9vW","uuidReceiver":"71ef88df5f7e482fb472f344965beba8","started":1438565856136},{"_id":"55e21268c8f21846db2f2566","uuidSender":"8fca5ebf02f74a369b13f3407ad4a9bc","uuidReceiver":"71ef88df5f7e482fb472f344965beba8","started":1440879208434},{"_id":"56c26b190cf2d1a91ec25e83","uuidSender":"1nmHNU1mERJEG452cx1FazPx3RpAuZ9vW","uuidReceiver":"71ef88df5f7e482fb472f344965beba8","started":1455581977070},{"_id":"5755edc20cf2db67507e3a2e","uuidSender":"0ce4597de5484c0e82a067fa0bf171df","uuidReceiver":"71ef88df5f7e482fb472f344965beba8","started":1465249218466},{"_id":"5851ed8d0cf2b9563974034d","uuidSender":"d3dd0059775e46b1b1a63b94a10d2450","uuidReceiver":"71ef88df5f7e482fb472f344965beba8","started":1481764237412},{"_id":"5d087a1f0cf2d7aebbf96fd6","uuidSender":"d6695d1ea7ae480bb2129a3b7d0269ad","uuidReceiver":"71ef88df5f7e482fb472f344965beba8","started":1560836639346},{"_id":"5d703a880cf299e651b71cbb","uuidSender":"cbd9e9009ee94d159d52dff284ad7bf8","uuidReceiver":"71ef88df5f7e482fb472f344965beba8","started":1567636104118},{"_id":"5deeff7f0cf2d87bd75df1a0","uuidSender":"c53c4524ad174fd78134223fddedc484","uuidReceiver":"71ef88df5f7e482fb472f344965beba8","started":1575944063334},{"_id":"5e0638f20cf24f983d2cb02c","uuidSender":"1nmHNU1mERJEG452cx1FazPx3RpAuZ9vW","uuidReceiver":"71ef88df5f7e482fb472f344965beba8","started":1577466098097},{"_id":"5e1e70e70cf2795e4f1322be","uuidSender":"85090a8b495d4856817fd6df1d4da0bd","uuidReceiver":"71ef88df5f7e482fb472f344965beba8","started":1579053287837},{"_id":"5e1fbe4b0cf2795e4f14a36f","uuidSender":"1nmHNU1mERJEG452cx1FazPx3RpAuZ9vW","uuidReceiver":"71ef88df5f7e482fb472f344965beba8","started":1579138635859},{"_id":"5e25910d0cf2a892e569db39","uuidSender":"63ac73044aca4b2f908baff858cf34b9","uuidReceiver":"71ef88df5f7e482fb472f344965beba8","started":1579520269103},{"_id":"5e712f890cf2d292e1148ba6","uuidSender":"1nmHNU1mERJEG452cx1FazPx3RpAuZ9vW","uuidReceiver":"71ef88df5f7e482fb472f344965beba8","started":1584476041946},{"_id":"5e819f550cf2675e4372109b","uuidSender":"8cd38d8f97a24090a43e2e0ce898c521","uuidReceiver":"71ef88df5f7e482fb472f344965beba8","started":1585553237302},{"_id":"5e82c5ad0cf2675e437308aa","uuidSender":"d618457dd6044256bdb287b0df8137d4","uuidReceiver":"71ef88df5f7e482fb472f344965beba8","started":1585628589448},{"_id":"5eae3e420cf26efdbd55b592","uuidSender":"a936c3468bec4a2685199a398212b62d","uuidReceiver":"71ef88df5f7e482fb472f344965beba8","started":1588477506714},{"_id":"5ebd9da40cf22f431e9164d8","uuidSender":"1nmHNU1mERJEG452cx1FazPx3RpAuZ9vW","uuidReceiver":"71ef88df5f7e482fb472f344965beba8","started":1589484964296}]
}
Here is my code:
def get_friend_count(name):
getUUID = f"https://api.mojang.com/users/profiles/minecraft/{name}"
res = requests.get(getUUID)
data = res.json()
if data["id"] is None:
return None
returnUuid = (data["id"])
url1 = f"https://api.hypixel.net/friends?key={API_KEY}&uuid=" + returnUuid
res2 = requests.get(url1)
data2 = res2.json()
if data2["records"] is None:
return None
friend_count = len(data["records"])
return "Friends: " + friend_count
getUUID gets the UUID from the requested username and then uses the UUID to get the players Hypixel stats.
Any help is appreciated, thanks!
Mojang API: https://wiki.vg/Mojang_API
Hypixel API: https://github.com/HypixelDev/PublicAPI/tree/master/Documentation

Did you just forget to request.get(url1)?
In any case, from the mojang API doc, it seems the endpoint you query (https://api.mojang.com/users/profiles/minecraft/) never returns a "records" key ... thus the error when you do if data["records"] is None

Accessing Meta Data from AWS S3 with AWS Lambda

I would like to retrieve some meta data I added (using the console x-amz-meta-my_variable) every time I upload an object to S3.
I have set up lambda through the console to trigger every time an object is uploaded to my bucket
I am wondering if I can use something like variable = event['Records'][0]['s3']['object']['my_variable'] to retrieve this data or if I have to connect back to S3 with the bucket and key and then call some function to retrieve it?
Below is the code:
from __future__ import print_function
import json
import urllib
import boto3
print('Loading function')
s3 = boto3.client('s3')
def lambda_handler(event, context):
# Get the object from the event and show its content type
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key']).decode('utf8')
# variable = event['Records'][0]['s3']['object']['my_variable']
try:
response = s3.get_object(Bucket=bucket, Key=key)
# Call some function here?
print("CONTENT TYPE: " + response['ContentType'])
return response['ContentType']
except Exception as e:
print(e)
print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
raise e

The metadata is not in the event but in the head object.
The HEAD operation retrieves metadata from an object without returning the object itself. This operation is useful if you are interested only in an object's metadata. To use HEAD, you must have READ access to the object.
A HEAD request has the same options as a GET operation on an object. The response is identical to the GET response except that there is no response body.
s3.head_object(Bucket=bucket, Key=key)
Below code is a snippet to get the metadata.
from __future__ import print_function
import boto3, logging
s3 = boto3.client('s3')
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
for record in event['Records']
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
response = s3.head_object(Bucket=bucket, Key=key)
logger.info('Response: {}'.format(response))
print("Author : " + response['Metadata']['author'])
print("Description : " + response['Metadata']['description'])
Output:
[INFO] 2016-05-18T01:30:47.900Z 241f0cfc-1c98-12e6-b9a7-cf406f32a0dc Response: {u'AcceptRanges': 'bytes', u'ContentType': 'binary/octet-stream', 'ResponseMetadata': {'HTTPStatusCode': 200, 'HostId': 'K8JMVbEt5xA+qXuXOedb1y5nxuv6scMXnNH/rHVtxcg=', 'RequestId': 'D05BE92E55E0'}, u'LastModified': datetime.datetime(2016, 5, 17, 22, 54, 37, tzinfo=tzutc()), u'ContentLength': 94320, u'ETag': '"0e4d457d912bce9ff81952"', u'Metadata': {'author': 'Satyajit Ray', 'description':'He was an Indian filmmaker, widely regarded as one of the greatest filmmakers of the 20th century.'}}
Author : Satyajit Ray
Description : He was an Indian filmmaker, widely regarded as one of the greatest filmmakers of the 20th century.

You can get the meta-data from the head object where you have to pass an object which contains bucket and key:-
Eg : Below is a code(in NodeJs) that you have to use in order to get the meta-data which was attached with the pre-signedUrl while generating it from the aws-sdk.
//for generating pre-signed url with meta data
exports.getSignedUrl = async (myKey, metadata) => {
const signedUrlExpireSeconds = 20000;
const params = {
Bucket: BUCKET,
Key: myKey,
Expires: signedUrlExpireSeconds,
/* ACL: 'bucket-owner-full-control', ContentType:'image/jpeg', */
ContentType: 'image/jpeg',
ACL: 'public-read',
Metadata: metadata,
};
const url = await s3.getSignedUrl('putObject', params);
return url;
};
//for obtainig the meta data for the bucket and key
const s3Object = reqBody.Records[0].s3;
const bucketName = s3Object.bucket.name;
const objectKey = s3Object.object.key;
const params = {
Bucket: bucketName,
Key: objectKey,
};
const data = await s3.headObject(params).promise();
const metadata = (!data) ? null : data.Metadata;```

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Create Lambda - DynamoDB Count Function - python

Related

How can I bulk upload JSON records to AWS OpenSearch index using a python client library?

Kinesis Firehose Lambda Transformation and Dynamic partition

Get the UserID for an IAM user programatically using Boto3

KeyError: records when trying to return record count from API

Accessing Meta Data from AWS S3 with AWS Lambda

Categories

Resources