fetch the latest file in a folder and upload to s3?

fetch the latest file in a folder and upload to s3? - python

filename variable is used to get the name of latest file
My aim is to monitor a folder and whenever new file is retrieved, automatically upload it to s3 bucket using boto3.
import time
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
from subprocess
import call
import os
import boto3
session = boto3.Session(aws_access_key_id='aws_access_key_id', aws_secret_access_key='aws_secret_access_key',
region_name='region_name')
s3 = session.client('s3')
class Watcher:
def __init__(self):
self.dir = os.path.abspath('D:\\project')
self.observer = Observer()
def run(self):
event_handler = Handler()
self.observer.schedule(event_handler, self.dir, recursive=True)
self.observer.start()
try:
while True:
time.sleep(5)
except:
self.observer.stop()
print("Error")
self.observer.join()
class Handler(FileSystemEventHandler):
#staticmethod
def on_any_event(event):
if event.is_directory:
return None
elif event.event_type == 'created':
print("Received created event - %s." % event.src_path)
s3.upload_file(Filename=event.src_path, bucket='bucketname, key=test-file-1)
if __name__ == '__main__':
w = Watcher()
w.run()
FileNotFoundError: [WinError 2] The system cannot find the file specified

As #alexhall mentioned in the comment, s3.meta.client.upload_file method will upload a file. You can read about boto3 s3 client's upload method documentation here: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.upload_file. However, it is a bit silly example they have there, as they are first creating an s3 resource rather than s3 client, and then because s3 resource does not actually have a method to upload file, they revert back to s3 client. You might as well directly create and use s3 client for uploads.
You are also relying on the fact that boto3 uses default session when you create the s3 resource like you did:
boto3.resource('s3')
This would work fine if you are running the code on lambda or if you are in an ec2 instance that has an IAM role configured for it to access s3 but I think you are running this outside AWS, in which case, you can have a boto3.Session() created first using your credentials, and then a client (or resource) can use that session.
aws_access_key_id = '<AWS_ACCESS_KEY_ID>'
aws_secret_access_key = '<AWS_SECRET_ACCESS_KEY>'
region_name = 'us-east-1'
session = boto3.Session(aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key,
region_name=region_name)
s3 = session.client('s3')
You can read about Session configuration here: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
As mentioned above, because you are trying to upload file, and you do not seem to do anything else with it, you may as well directly create an s3 client rather than s3 resource like you did and then get the s3 client using 'meta.client'.
instead of command = ... line, you simply use:
s3.upload_file(Filename, Bucket = 'aaaaa', Key='test-file-1')
You can delete the last line. You would 'call' if you you were running an OS/System command rather than a something within python.

Not sure if you are doing this to learn python (boto3). If so, congrats.
If not, AWS already provided such feature. So you keep everything in your code, but shell out to the AWS CLI.

Related

Python-Boto3 to S3 limitation

Im new to Python and for my project purpose and Im using using boto3 to access AWS S3 in a pycharm IDE
I completed package installation for boto3 ,pyboto then created a Python file and successfully created bucket and transferred the files to S3 from my local using boto3
Later i created another python file in the same working directory and using the same steps but this time Im not able to connect AWS and not even API calls Im getting
So am doubtful that whether we can use boto3 packages with only one python file and we cant use it another python file in same directory?
I tried by creating both s3 client and s3 resource but no luck
Please advice is there any limitations is there for boto3 ?
Below are the Python code:-
import boto3
import OS
bucket_name='*****'
def s3_client():
s3=boto3.client('s3')
""":type:pyboto3:s3"""
return s3
def s3_resource():
s3=boto3.resource('s3')
return s3
def create_bucket(bucket_name):
val=s3_client().create_bucket(=bucket_name,
CreateBucketConfiguration={
'LocationConstraint':'ap-south-1'
})
return val
def upload_file():
s3=s3_resource().meta.client.upload_file('d:/s3_load2.csv',bucket_name,'snowflake.csv')
return s3
def upload_small_file():
s3=s3_client().upload_file('d:/s3_load2.csv',bucket_name,'snowflake.csv')
return s3
def create_bucket(bucket_name):
val=s3_client().create_bucket(
Bucket=bucket_name,
CreateBucketConfiguration={
'LocationConstraint':'ap-south-1'
})
return val
#calling
upload_small_file()

Perhaps the AWS credentials weren't set in the environment where you run the 2nd script. Or maybe the credentials you were using while running the 1st script already expired. Try getting your AWS credentials and set them when you instantiate a boto3 client or resource as documented:
import boto3
client = boto3.client(
's3',
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY,
aws_session_token=SESSION_TOKEN # This is only required for temporary credentials
)
Or you can also try setting them as environment variables.
export AWS_ACCESS_KEY_ID="some key"
export AWS_SECRET_ACCESS_KEY="some key"
export AWS_SESSION_TOKEN="some token" # This is only required for temporary credentials
Or as a configuration file. See the docs for the complete list.

why does my api call in my python code not work when called using ssh in lambda

First off, I'm pretty new to AWS and it took me a lot of trial and error to get my lambda function to execute my python script which sit on an ec2 instance.
If I run my code manually through command line in my ec2 instance, the code works perfectly, it call the requested api and saves down the data.
If I call my script through a lambda function using ssh, it stops executing at the api call, the lamda returns that everything ran, but it didn't, I get no output messages returned to say there was an exception, nothing in the cloudwatch log either. I know it starts to execute my code, because if I put print statments before the api calls, I see them returned in the cloudwatch log.
Any ideas to help out a noob.
Here is my lambda code:
import time
import boto3
import json
import paramiko
def lambda_handler(event, context):
ec2 = boto3.resource('ec2', region_name='eu-west-2')
instance_id = 'removed_id'
instance = ec2.Instance(instance_id)
# Start the instance
instance.start()
s3_client = boto3.client('s3')
# Download private key file from secure S3 bucket
# and save it inside /tmp/ folder of lambda event
s3_client.download_file('removed_bucket', 'SSEC2.pem',
'/tmp/SSEC2.pem')
# Allowing few seconds for the download to complete
time.sleep(2)
# Giving some time to start the instance completely
time.sleep(60)
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
privkey = paramiko.RSAKey.from_private_key_file('/tmp/SSEC2.pem')
# username is most likely 'ec2-user' or 'root' or 'ubuntu'
# depending upon your ec2 AMI
ssh.connect(
instance.public_dns_name, username='ec2-user', pkey=privkey
)
print('Executing')
stdin, stdout, stderr = ssh.exec_command(
'/home/ec2-user/miniconda3/bin/python /home/ec2-user/api-calls/main.py')
stdin.flush()
data = stdout.read().splitlines()
for line in data:
print(line)
ssh.close()
# Stop the instance
# instance.stop()
return {
'statusCode': 200,
'body': json.dumps('Execution successful ' )
}
edit :
okay, slight update, it's not falling over on the api call, it's actually stopping when it tries to open a config file a write, which is stored in "config/config.json". Now obviously this works in the ec2 environment when I'm executing manually, so this must have something to do with enviroment variables in ec2 not being the same if the job is triggered from elsewhere?? here is the exact code :
#staticmethod
def get_config():
with open("config/config.json", "r") as read_file:
data = json.load(read_file)
return data

problem solved. I need to use the full path names when executing the code remotely.
with open("/home/ec2-user/api-calls/config/config.json", "r") as read_file :
'''

The effective way to handle missing file on s3

clients.py
"""
Wraps boto3 to provide client and resource objects that connect to Localstack when
running tests if LOCALSTACK_URL env variable is set, otherwise connect to real AWS.
"""
import os
import boto3
# Port 4566 is the universal "edge" port for all services since localstack v0.11.0,
# so LOCALSTACK_URL can be of the form "http://hostname:4566".
# "None" means boto3 will use the default endpoint for the real AWS services.
ENDPOINT_URL = os.getenv("LOCALSTACK_URL", None)
def s3_resource():
return boto3.resource("s3", endpoint_url=ENDPOINT_URL)
def s3_client():
return boto3.client("s3", endpoint_url=ENDPOINT_URL)
mycode.py
from mymod.aws.clients import s3_resource
s3 = s3_resource()
def __get_object_etag(self, s3_dir_to_download_file_from: str, file_name: str) -> str
bucket, key = s3.deconstruct_s3_path(
f"{s3_dir_to_download_file_from}/{file_name}"
)
try:
etag_value = s3_resource().Object(bucket, key).e_tag
return etag_value
except botocore.exceptions.ClientError:
raise
I am wondering if in mycode.py I am doing the correct error handling when the file i am looking for does not exist on s3. I basically expect key to be there and if not i want to raise and error i do not want to proceed further as this code will be used part of a pipeline that relies on each previous step.
I would love to know if i am handling the error correct or if i am wrong then how would one handle this case?
From what I understand "exceptions" are suppose to handle errors and proceed further but in my case I do not want to proceed further in the rest of my code if the file i am looking for on S3 is missing.

File Migration from EC2 to S3

We are currently creating a website that is kind of an upgrade to an old existing one. We would like to keep the old posts (that include images) in the new website. The old files are kept in an ec2 instance while the new website is serverless and keeps all it's files in s3.
My question, is there any way I could transfer the old files (from ec2) to the new s3 bucket using Python. I would like rename and relocate the files in the new filename/filepathing pattern that we devs decided.

There is boto3, python aws toolkit.
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-uploading-files.html
import logging
import boto3
from botocore.exceptions import ClientError
def upload_file(file_name, bucket, object_name=None):
"""Upload a file to an S3 bucket
:param file_name: File to upload
:param bucket: Bucket to upload to
:param object_name: S3 object name. If not specified then file_name is used
:return: True if file was uploaded, else False
"""
# If S3 object_name was not specified, use file_name
if object_name is None:
object_name = file_name
# Upload the file
s3_client = boto3.client('s3')
try:
response = s3_client.upload_file(file_name, bucket, object_name)
except ClientError as e:
logging.error(e)
return False
return True
You can write script with S3 upload_file function, then run on your ec2 locally.

AWS Lambda S3 'x-amz-version-id' Not Included in Metadata

I have the following code:
import json
import requests
import boto3
import os
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
print('Received event:' + str(event))
bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
S3Client = boto3.client('s3','us-east-1')
url = "https://s3.amazonaws.com/%s/%s" % (bucket, key)
ID_Resp = S3Client.head_object(Bucket=bucket, Key=key)
print(ID_Resp)
version_id = ID_Resp['ResponseMetadata']['HTTPHeaders']['x-amz-version-id']
Where I'm receiving an S3 object from a bucket on Object Create. Ultimately I want to POST metadata about the file to a custom API. I'm trying to call the head_object method so I can get the x-amz-version-id so I know what version of the file was submitted to the API. When I run the last line on my local machine, it works, but when I run it from within the lambda environment, the key value pair for x-amz-version-id is missing altogether. Does anyone know why this might be? Cant find anything in the docs about it.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

fetch the latest file in a folder and upload to s3? - python

Not sure if you are doing this to learn python (boto3). If so, congrats. If not, AWS already provided such feature. So you keep everything in your code, but shell out to the AWS CLI.

Related

Python-Boto3 to S3 limitation

why does my api call in my python code not work when called using ssh in lambda

The effective way to handle missing file on s3

File Migration from EC2 to S3

AWS Lambda S3 'x-amz-version-id' Not Included in Metadata

Categories

Resources