I have some data in S3 and I want to create a lambda function to predict the output with my deployed aws sagemaker endpoint then I put the outputs in S3 again. Is it necessary in this case to create an api gateway like decribed in this link ? and in the lambda function what I have to put. I expect to put (where to find the data, how to invoke the endpoint, where to put the data)
import boto3
import io
import json
import csv
import os
client = boto3.client('s3') #low-level functional API
resource = boto3.resource('s3') #high-level object-oriented API
my_bucket = resource.Bucket('demo-scikit-byo-iris') #subsitute this for your s3 bucket name.
obj = client.get_object(Bucket='demo-scikit-byo-iris', Key='foo.csv')
lines= obj['Body'].read().decode('utf-8').splitlines()
reader = csv.reader(lines)
import io
file = io.StringIO(lines)
# grab environment variables
runtime= boto3.client('runtime.sagemaker')
response = runtime.invoke_endpoint(
EndpointName= 'nilm2',
Body = file.getvalue(),
ContentType='*/*',
Accept = 'Accept')
output = response['Body'].read().decode('utf-8')
my data is a csv file of 2 columns of floats with no headers, the problem is that lines return a list of strings(each row is an element of this list:['11.55,65.23', '55.68,69.56'...]) the invoke work well but the response is also a string: output = '65.23\n,65.23\n,22.56\n,...'
So how to save this output to S3 as a csv file
Thanks
If your Lambda function is scheduled, then you won't need an API Gateway. But if the predict action will be triggered by a user, by an application, for example, you will need.
When you call the invoke endpoint, actually you are calling a SageMaker endpoint, which is not the same as an API Gateway endpoint.
A common architecture with SageMaker is:
API Gateway with receives a request then calls an authorizer, then
invoke your Lambda;
A Lambda with does some parsing in your input data, then calls your SageMaker prediction endpoint, then, handles the result and returns to your application.
By the situation you describe, I can't say if your task is some academic stuff or a production one.
So, how you can save the data as a CSV file from your Lambda?
I believe you can just parse the output, then just upload the file to S3. Here you will parse manually or with a lib, with boto3 you can upload the file. The output of your model depends on your implementation on SageMaker image. So, if you need the response data in another format, maybe you will need to use a custom image. I normally use a custom image, which I can define how I want to handle my data on requests/responses.
In terms of a production task, I certainly recommend you check Batch transform jobs from SageMaker. You can provide an input file (the S3 path) and also a destination file (another S3 path). The SageMaker will run the batch predictions and will persist a file with the results. Also, you won't need to deploy your model to an endpoint, when this job run, will create an instance of your endpoint, download your data to predict, do the predictions, upload the output, and shut down the instance. You only need a trained model.
Here some info about Batch transform jobs:
https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-batch.html
https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-batch-transform.html
I hope it helps, let me know if need more info.
Regards.
Related
I´m trying to do the following:
when I upload a file in my s3 storage, the lambda picks this json file and converts it into a csv file.
How can I specify in the lambda code which file must pick?
example of my code in local:
import pandas as pd
df = pd.read_json('movies.json')
df.to_csv('csv-movies.csv')
in this example, I provide the name of the file...but..how can I manage that on a Lambda?
I think I don´t understand how Lambda works...could you give me an example?
Lambda spins up execution environments to handle your requests. When it initialises these environments, it'll pull the code you uploaded, and execute it when invoked.
Execution environments have a concept of ephemeral (temporary) storage with a default size of 512mb.
Lambda doesn't have access to your files in S3 by default. You'd first need to download your file from S3 using something like the AWS SDK for Python. You can store it in the /tmp directory to make use of the ephemeral storage I mentioned earlier.
Once you've downloaded the file using the SDK, you can interact with it as you would if you were running this locally, like in your example.
On the flip side, you'd also need to use the SDK to upload the CSV back to S3 if you want to keep it beyond the lifecycle of that execution environment.
Something else you might want to explore in future is reading that file into memory and doing away with storing it in ephemeral storage altogether.
In order to achieve this you will need to use S3 as the event source for your Lambda, there's a useful tutorial for this provided by AWS themselves and has some sample python code to assist you, you can view it here.
To break it down slightly further and answer how you get the name of the file. The lambda handler will look similar to the following:
def lambda_handler(event, context)
What is important here is the event object. When your event source is the S3 bucket you will be given the name of the bucket and the s3 key in the object which is effectively the path to the file in the S3 bucket. With this information you can do some logic to decide if you want to download the file from that path. If you do, you can use the S3 get_object( ) api call as shown in the tutorial.
Once this file is downloaded it can be used like any other file you would have on your local machine, so you can then proceed to process the json to a CSV. Once it is converted you will presumably want to put it back in S3 and for this you can use the S3 put_object( ) call for this and reuse the information in the event object in order to specify the path.
I am trying to get multiple objects from an S3 bucket using python with aws cli installed and configure. I can currently get a single file using this code.
import boto3
url = boto3.client('s3').generate_presigned_url(
ClientMethod='get_object',
Params={'Bucket': 'test-bucket', 'Key':'00001.png'},
ExpiresIn=3600)
print(url)
However I need to generate the same for 100 other image files, how can I possibly do this?
Run the code 100 times -- seriously!
You should separate out the client generation, such as:
s3_client = boto3.client('s3')
url = s3_client.generate_presigned_url(...)
It's a very quick command and doesn't require a call to AWS so you can repeat or loop-through the last line many times.
Each object will require a separate pre-signed URL because permission is being generated for just one object at a time.
I have a python function which writes to an audio file. I want to get the file to my local system as soon as I trigger AWS Lambda. I don't want to use S3 Bucket for this.
I have checked the method to store the file in /tmp/ folder in aws. But I don't know how to get the file to my local file system.
If there is any other way please let me know. Or how to get audio file from lambda /tmp/ folder to my local machine.
I have successfully written to /tmp/ folder it works fine.
with('/tmp/filename.wav', 'wb') as f:
f.write(content)
As soon as I trigger lambda function from API Gateway I want 'wav' file on my local machine.
You said that you want to save the file to your local as soon as the Lambda is invoked, but I think what you mean is that as son as your Lambda is done whatever is doing you want to save the resultant file to your Local machine.
If I’m correct about the above, then subject to the limits of Lambda and API gateway you can return the audio file as the result of the function; in your function, simply have it return the resulting file in the response.
As per the AWS documentation, the maximum payload size from API gateway is 10MB, and API gateway has a timeout of 30 seconds (see here). That being said, the maximum invocation payload of Lambda is 6MB (see here). These two combined means that your response from Lambda has to be less than 6MB and complete within 30 seconds. If the response is more than 6MB or takes more than 30 seconds, then you will receive an error.
Although you mentioned that you don’t wish to use S3, a better pattern, especially if your file size could be larger than 6MB, would be to use S3 to store the file, and have Lambda/API gateway return a “302 Found” redirect with the location of the file in S3; your browser will still automatically download it to your local machine, but you won’t have to worry about API gateway timeouts, or Lambda response limits.
One possibility would be for your lambda write your output to Dropbox using the Dropbox api. Dropbox is very good at keeping files synced to local machines.
You'll need to be careful with your keys - I suggest AWS Secrets Manager for this, which you can get to easily from Python.
I have roughly 80tb of images hosted in an S3 bucket which I need to send to an API for image classification. Once the images are classified, the API will forward the results to another endpoint.
Currently, I am thinking of using boto to interact with S3 and perhaps Apache airflow to download these images in batches and forward them to the classification API, which will forward the results of the classification to a web app for display.
In the future I want to automatically send any new image added to the S3 bucket to the API for classification. To achieve this I am hoping to use AWS lambda and S3 notifications to trigger this function.
Would this be the best practice for such a solution?
Thank you.
For your future scenarios, yes, that approach would be sensible:
Configure Amazon S3 Events to trigger an AWS Lambda function when a new object is created
The Lambda function can download the object (to /tmp/) and call the remote API
Make sure the Lambda function deletes the temporary file before exiting since the Lambda container might be reused and there is a 500MB storage limit
Please note that the Lambda function will trigger on a single object, rather than in batches.
In s3 bucket daily new JSON files are dumping , i have to create solution which pick the latest file when it arrives PARSE the JSON and load it to Snowflake Datawarehouse. may someone please share your thoughts how can we achieve
There are a number of ways to do this depending on your needs. I would suggest creating an event to trigger a lambda function.
https://docs.aws.amazon.com/lambda/latest/dg/with-s3.html
Another option may be to create a SQS message when the file lands on s3 and have an ec2 instance poll the queue and process as necessary.
https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/sqs-example-long-polling.html
edit: Here is a more detailed explanation on how to create events from s3 and trigger lambda functions. Documentation is provided by Snowflake
https://docs.snowflake.net/manuals/user-guide/data-load-snowpipe-rest-lambda.html
Look into Snowpipe, it lets you do that within the system, making it (possibly) much easier.
There are some aspects to be considered such as is it a batch or streaming data , do you want retry loading the file in case there is wrong data or format or do you want to make it a generic process to be able to handle different file formats/ file types(csv/json) and stages.
In our case we have built a generic s3 to Snowflake load using Python and Luigi and also implemented the same using SSIS but for csv/txt file only.
In my case, I have a python script which get information about the bucket with boto.
Once I detect a change, I call the REST Endpoint Insertfiles on SnowPipe.
Phasing:
detect S3 change
get S3 object path
parse Content and transform to CSV in S3 (same bucket or other snowpipe can connect)
Call SnowPipe REST API
What you need:
Create a user with a public key
Create your stage on SnowFlake with AWS credential in order to access S3
Create your pipe on Snowflake with your user role
Sign a JWT
I also tried with a Talend job with TOS BigData.
Hope it helps.