I'd like to write a lambda python code to move files to the same S3 bucket.
[same S3 bucket]
/location-as-is/file.jpg
[same S3 bucket]
/location-to-be/file.jpg
How can I do that?
Thank you.
In order to get this to work you will need a few things. First is the lambda code itself. You should be able to use the python sdk boto3 to make the call to copy. Here is an example how to copy your file:
import json
import boto3
s3 = boto3.resource('s3')
def lambda_handler(event, context):
my_bucket = "example-bucket"
current_object_key = "fileA/keyA.jpg"
new_object_key = "fileB/keyB.jpg"
copy_source = {
'Bucket': my_bucket,
'Key': current_object_key
}
s3.meta.client.copy(copy_source, my_bucket, new_object_key)
You will also need to make sure you lambda execution role has proper s3 read and write permissions and that your s3 bucket policy is configured to allow your lambda role to access it.
You can use boto for this purpose, as below:
import boto
c = boto.connect_s3()
src_buc = c.get_bucket('Source_Bucket')
sink_buc = c.get_bucket('Sink_Bucket')
and then you can iterate over all your keys to copy the content:
for k in src_buc.list():
# copy to sink
sink_buc.copy_key(k.key.name, src_buc.name, k.key.name)
Related
I'm trying to delete everything from a specific folder in an S3 bucket with AWS Lambda using Python. The Lambda runs successfully however, the files still exist in "folder1". There will be no sub-folder under this folder except files.
Could someone please assist? Here is the code:
import json
import os
import boto3
def lambda_handler(event,context):
s3 = boto3.resource('s3')
deletefile_bucket = s3.Bucket('test_bucket')
response = deletefile_bucket.delete_objects(
Delete={
'Objects': [
{
'Key': 'folder1/'
},
],
}
)
The delete_objects() command requires a list of object keys to delete. It does not perform wildcard operations and it does not delete the contents of subdirectories.
You will need to obtain a listing of all objects and then specifically request those objects to be deleted.
The delete_objects() command accepts up to 1000 objects to delete.
I would like to write a test to mock the download of a function from s3 and replace it locally with an actual file that exists of my machine. I took inspiration from this post. The idea is the following:
from moto import mock_s3
import boto3
def dl(src_f, dest_f):
s3 = boto3.resource('s3')
s3.Bucket('fake_bucket').download_file(src_f, dest_f)
#mock_s3
def _create_and_mock_bucket():
# Create fake bucket and mock it
bucket = "fake_bucket"
# We need to create the bucket since this is all in Moto's 'virtual' AWS account
file_path = "some_real_file.txt"
s3 = boto3.client("s3", region_name="us-east-1")
s3.create_bucket(Bucket=bucket)
s3.put_object(Bucket=bucket, Key=file_path, Body="")
dl(file_path, 'some_other_real_file.txt')
_create_and_mock_bucket()
Now some_other_real_file.txt exists, but it is not a copy of some_real_file.txt. Any idea on how to do that?
If 'some_real_file.txt' already exists on your system, you should use upload_file instead:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.upload_file
For your example:
file_path = "some_real_file.txt"
s3 = boto3.client("s3", region_name="us-east-1")
s3.create_bucket(Bucket=bucket)
s3_resource = boto3.resource('s3')
s3_resource.meta.client.upload_file(file_path, bucket, file_path)
Your code currently creates an empty file in S3 (since Body=""), and that is exactly what is being downloaded to 'some_other_real_file.txt'.
Notice that, if you change the Body-parameter to have some text in it, that exact content will be downloaded to 'some_other_real_file.txt'.
I'm getting error, When i try to delete all files under specific folder
Problem is here ['Key': 'testpart1/.']
Also i would like to delete 30 days older file, please help me with script
import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('my-bucket')
response = my_bucket.delete_objects(
Delete={
'Objects': [
{
'Key': 'testpart1/*.*' # the_name of_your_file
}
]
}
The code below will delete all files under the prefix recursively:
import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('my-bucket')
response = my_bucket.objects.filter(Prefix="testpart1/").delete()
Please check https://stackoverflow.com/a/59146547/4214976 to filter out the object based on date.
How to list S3 bucket Delimiter paths?
Basically I want to list all of the "directories" and or "sub-directories" in a s3 bucket. I know these don't physically exist. Basically I want all the objects that contain the delimiter and then only return the key path before for the delimiter. Starting under a prefix would be even better but at the bucket level should be enough.
Example S3 Bucket:
root.json
/2018/cats/fluffy.png
/2018/cats/gary.png
/2018/dogs/rover.png
/2018/dogs/jax.png
I would like to then do something like:
s3_client = boto3.client('s3')
s3_client.list_objects(only_show_delimiter_paths=True)
Result
/2018/
/2018/cats/
/2018/dogs/
I don't see any way to do this natively using: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects
I could pull all the object names and do this in my application code but that seems inefficient.
The Amazon S3 page in boto3 has this example:
List top-level common prefixes in Amazon S3 bucket
This example shows how to list all of the top-level common prefixes in an Amazon S3 bucket:
import boto3
client = boto3.client('s3')
paginator = client.get_paginator('list_objects')
result = paginator.paginate(Bucket='my-bucket', Delimiter='/')
for prefix in result.search('CommonPrefixes'):
print(prefix.get('Prefix'))
But, it only shows top-level prefixes.
So, here's some code to print all the 'folders':
import boto3
client = boto3.client('s3')
objects = client.list_objects_v2(Bucket='my-bucket')
keys = [o['Key'] for o in objects['Contents']]
folders = {k[:k.rfind('/')+1] for k in keys if k.rfind('/') != -1}
print ('\n'.join(folders))
I have one python lambda function that will list each file in an S3 bucket (code below). What I am not clear on how to do is pass each file object to another lambda function as an input and have separate executions. The goal is to have x number of files in the list to create x number of the second lambdas to execute concurrently (i.e. if there are 20 files in the list, then execute the second lambda with 20 executions with each file passed to the lambda function respectively). The file will be used in the second lambda function for a join in Pandas.
Really appreciate any help!
List of files (lambda 1)
import boto3
#Start session with Profile
session =
boto3.session.Session(profile_name='<security_token_service_profile>')
client = session.client('s3') #low-level functional API
resource = session.resource('s3') #high-level object-oriented API
#State S3 bucket
my_bucket = resource.Bucket('<bucket>') #subsitute this for your s3 bucket name.
#List all files
files = list(my_bucket.objects.filter(Prefix='<path_to_file>'))
print(files)
Thank you #jarmod! That worked. For those who might need this in the future, my lambda script above has been modified as follows:
import boto3
import json
print('[INFO] Loading Function')
def lambda_handler(event, context):
print("[INFO] Received event: " + json.dumps(event, indent=2))
#Start session with region details for authentication
session = boto3.session.Session(region_name='<region>')
client = session.client('s3') #low-level functional API
resource = session.resource('s3') #high-level object-oriented API
#Identify S3 bucket
my_bucket = resource.Bucket('<bucket>') #subsitute this for your s3 bucket name.
#List all files
files = list(my_bucket.objects.filter(Prefix='<file_path>'))
for file in files:
payload = json.dumps({"key": file.key})
print(payload)
client_lambda = session.client('lambda')
client_lambda.invoke(
FunctionName='<lambda_function_name_to_call>',
InvocationType='Event',
LogType='None',
Payload=payload
)
if __name__ == '__main__':
lambda_handler()`