Looking for example
We insert single record in dynamodb
We need to retrieve that item as soon as it inserted in dynamodb database using python
Just like get record seeking continuously for latest item in db once it inserted it will retrieved from db
You typically want an event-based solution without having to poll the database which means using DynamoDB Streams and Lambda functions.
You can, however, also write a Python client that polls DynamoDB Streams.
Related
I have configured a DMS migration instance that replicates data from Mysql into a AWS Kinesis stream, but I noticed that when I process the kinesis records I pick up duplicate records.This does not happen for every record.
How do I prevent these duplicate records from being pushed to the kinesis data stream or the S3 bucket?
I'm using a lambda function to process the records, so I thought of adding logic to de-duplicate the data, but I'm not sure how to without persisting the data somewhere. I need to process the data in real-time so persisting the data would not be idle.
Regards
Pragesan
I added a global counter variable that stores the pk of each record,so each invocation checks the previous pk value,and if it is different I insert the value.
Is there a way of doing a batch insert/update of records into AWS Aurora using "pre-formed" Postgresql statements, using Python?
My scenario: I have an AWS lambda that receives data changes (insert/modify/remove) from DynamoDB via Kinesis, which then needs to apply them to an instance of Postgres in AWS Aurora.
All I've managed to find doing an Internet search is the use of Boto3 via the "batch_execute_statement" command in the RDS Data Service client, where one needs to populate a list of parameters for each individual record.
If possible, I would like a mechanism where I can supply many "pre-formed" INSERT/UPDATE/DELETE Postgresql statements to the database in a batch operation.
Many thanks in advance for any assistance.
I used Psycopg2 and an SqlAlchemy engine's raw connection (instead of Boto3) and looped through my list of SQL statements, executing each one in turn.
We want to export data from dynamo db to a file. We have around 150,000 records each record is of 430 bytes. It would be a periodic activity once a week. Can we do that with lambda? Is it possible as lambda has a maximum execution time of 15 minutes?
If there is a better option using python or via UI as I'm unable to export more than 100 records from UI?
One really simple option is to use the Command Line Interface tools
aws dynamodb scan --table-name YOURTABLE --output text > outputfile.txt
This would give you a tab delimited output. You can run it as a cronjob for regular output.
The scan wouldn't take anything like 15 minutes (probably just a few seconds). So you wouldn't need to worry about your Lambda timing out if you did it that way.
You can export your data from dynamodb in a number of ways.
The simplest way would be a full table scan:
dynamodb = boto3.client('dynamodb')
response = dynamodb.scan(
TableName=your_table,
Select='ALL_ATTRIBUTES')
data = response['Items']
while 'LastEvaluatedKey' in response:
response = dynamodb.scan(
TableName=your_table,
Select='ALL_ATTRIBUTES',
ExclusiveStartKey=response['LastEvaluatedKey'])
data.extend(response['Items'])
# save your data as csv here
But if you want to do it every x days, what I would recomend to you is:
Create your first dump from your table with the code above.
Then, you can create a dynamodb trigger to a lambda function that will receive all your table changes (insert, update, delete), and then you can append the data in your csv file. The code would be something like:
def lambda_handler(event, context):
for record in event['Records']:
# get the changes here and save it
Since you will receive only your table updates, you don't need to worry about the 15 minutes execution from lambda.
You can read more about dynamodb streams and lambda here: DynamoDB Streams and AWS Lambda Triggers
And if you want to work on your data, you can always create a aws glue or a EMR cluster.
Guys we resolved it using AWS lambda, 150,000 records (each record is of 430 bytes) are processed to csv file in 2.2 minutes using maximum available memory (3008 mb). Created an event rule for that to run on periodic basis. Time and size is written so that anyone can calculate how much they can do with lambda
You can refer to an existing question on stackoverflow. This question is about exporting dynamo db table as a csv.
I'm currently building a pipeline that reads data from MongoDB everytime new document gets inserted and send it to external data source after some preprocessing. Preprocessing and sending data to external data source part works well the way I designed.
The problem, however, I can't read data from MongoDB. I'm trying to build a trigger that reads data from MongoDB when certain MongoDB collection gets updated then sends it to python. I'm not considering polling a MongoDB since it's too resource-intensive.
I've found this library mongotriggers(https://github.com/drorasaf/mongotriggers/) and now taking a look at it.
In summary, how can I build a trigger that sends data to python from MongoDB when new document gets inserted to specific collection?
Any comment or feedback would be appreciated.
Thanks in advance.
Best
Gee
In MongoDB v3.6+, you can now use MongoDB Change Streams. Change streams allow applications to access real-time data changes without the complexity and risk of tailing the oplog. Applications can use change streams to subscribe to all data changes on a single collection, a database, or an entire deployment, and immediately react to them.
For example to listen to streams from MongoDB when a new document gets inserted:
try:
with db.collection.watch([{'$match': {'operationType': 'insert'}}]) as stream:
for insert_change in stream:
# Do something
print(insert_change)
except pymongo.errors.PyMongoError:
# The ChangeStream encountered an unrecoverable error or the
# resume attempt failed to recreate the cursor.
logging.error('...')
pymongo.collection.Collection.watch() is available from PyMongo 3.6.0+.
I'm currently using SQLAlchemy with two distinct session objects. In one object, I am inserting rows into a mysql database. In the other session I am querying that database for the max row id. However, the second session is not querying the latest from the database. If I query the database manually, I see the correct, higher max row id.
How can I force the second session to query the live database?
The first session needs to commit to flush changes to the database.
first_session.commit()
Session holds all the objects in memory and flushes them together to the database (lazy loading, for efficiency). Thus the changes made by first_session are not visible to the second_session which is reading data from the database.
Had a similar problem, for some reason i had to commit both sessions. Even the one that is only reading.
This might be a problem with my code though, cannot use same session as it the code will run on different machines. Also documentation of SQLalchemy says that each session should be used by one thread only, although 1 reading and 1 writing should not be a problem.