I am trying to upload a csv file from my local to aws s3 bucket. Given below is the code I am using but it doesn't seen to upload the file to the s3 folder defined. Could anyone assist.
import boto3
from botocore.client import Config
ACCESS_KEY_ID = 'accesskeyid'
ACCESS_SECRET_KEY = 'secretkeyid'
BUCKET_NAME = 'bucketname'
data = open('/desktop/file.csv', 'rb')
s3 = boto3.resource(
's3',
aws_access_key_id=ACCESS_KEY_ID,
aws_secret_access_key=ACCESS_SECRET_KEY,
config=Config(signature_version='s3v4')
)
s3.Bucket(BUCKET_NAME).put_object(Key='/sub-folder/sub-folder2/file.csv', Body=data)
print ("Uploaded successfully")
Could anyone help me in finding where I am going wrong. Thanks
You need to remove the / from the beginning of the Key argument.
Using your existing code, the file path will be: BUCKET_NAME//sub-folder/sub-folder2/file.csv.
Related
I would like to write a test to mock the download of a function from s3 and replace it locally with an actual file that exists of my machine. I took inspiration from this post. The idea is the following:
from moto import mock_s3
import boto3
def dl(src_f, dest_f):
s3 = boto3.resource('s3')
s3.Bucket('fake_bucket').download_file(src_f, dest_f)
#mock_s3
def _create_and_mock_bucket():
# Create fake bucket and mock it
bucket = "fake_bucket"
# We need to create the bucket since this is all in Moto's 'virtual' AWS account
file_path = "some_real_file.txt"
s3 = boto3.client("s3", region_name="us-east-1")
s3.create_bucket(Bucket=bucket)
s3.put_object(Bucket=bucket, Key=file_path, Body="")
dl(file_path, 'some_other_real_file.txt')
_create_and_mock_bucket()
Now some_other_real_file.txt exists, but it is not a copy of some_real_file.txt. Any idea on how to do that?
If 'some_real_file.txt' already exists on your system, you should use upload_file instead:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.upload_file
For your example:
file_path = "some_real_file.txt"
s3 = boto3.client("s3", region_name="us-east-1")
s3.create_bucket(Bucket=bucket)
s3_resource = boto3.resource('s3')
s3_resource.meta.client.upload_file(file_path, bucket, file_path)
Your code currently creates an empty file in S3 (since Body=""), and that is exactly what is being downloaded to 'some_other_real_file.txt'.
Notice that, if you change the Body-parameter to have some text in it, that exact content will be downloaded to 'some_other_real_file.txt'.
I'm doing a project, where I read files from the S3 bucket and to get rid of all NA values then upload them to the different S3 bucket. I've been watching a Lambda tutorial and example codes, but I have a hard time understanding how it really works.
My goal is to read any file in the S3 bucket and using the Lambda function, I drop all the NA values, then upload them to a different S3 bucket. But I don't really understand what is going on. I read the documentation, but it wasn't very helpful for me to understand.
How can I make the below code to read CSV files from the S3 bucket, then drop all NA values, then upload them to the new S3 bucket?
import json
import os
import boto3
import csv
def lambda_handler(event, context):
for record in event['Records']:
bucket = record['s3']['bucket']['name']
file_key = record['s3']['object']['key']
s3 = boto3.client('s3')
csv_file = s3.get_object(Bucket=bucket, Key=file_key)
csv_content = csv_file['Body'].read().split(b'\n')
csv_data = csv.DictReader(csv_content)
Any links to the documentation, or video and advice will be appreciated.
Uploading files
def upload_file(file_name, bucket, object_name=None):
"""Upload a file to an S3 bucket
:param file_name: File to upload
:param bucket: Bucket to upload to
:param object_name: S3 object name. If not specified then file_name is used
:return: True if file was uploaded, else False
"""
# If S3 object_name was not specified, use file_name
if object_name is None:
object_name = file_name
# Upload the file
s3_client = boto3.client('s3')
try:
response = s3_client.upload_file(file_name, bucket, object_name)
except ClientError as e:
logging.error(e)
return False
return True
s3 download_file
import boto3
s3 = boto3.resource('s3')
s3.meta.client.download_file('mybucket', 'hello.txt', '/tmp/hello.txt')
Now you simply put these calls in any way you want to and process your csv files and then how you process and upload to s3 in efficiency that would be a completely different topic.
There are plenty of answere her ein this post How to upload a file to directory in S3 bucket using boto
You can check this one as well if curious, gives some idea how to process larger files.
Step 4: Create the Lambda function that splits input data
I am sucessfully downloading an image file to my local computer from my S3 bucket using the following:
import os
import boto3
import botocore
files = ['images/dog_picture.png']
bucket = 'animals'
s3 = boto3.resource('s3')
for file in files:
s3.Bucket(bucket).download_file(file, os.path.basename(file))
However, when I try to specify the directory to which the image should be saved on my local machine as is done in the docs:
s3.Bucket(bucket).download_file(file, os.path.basename(file), '/home/user/storage/new_image.png')
I get:
ValueError: Invalid extra_args key '/home/user/storage/new_image.png', must be one of: VersionId, SSECustomerAlgorithm, SSECustomerKey, SSECustomerKeyMD5, RequestPayer
I must be doing something wrong but I'm following the example in the docs. Can someone help me specify a local directory?
Looking into the docs, you're providing an extra parameter
import boto3
s3 = boto3.resource('s3')
s3.Bucket('mybucket').download_file('hello.txt', '/tmp/hello.txt')
From the docs, hello.txt is the name of the object on the bucket and /tmp/hello.txt is the path on your device, so the correct way would be
s3.Bucket(bucket).download_file(file, '/home/user/storage/new_image.png')
I created a S3 bucket and placed both a data.csv and a data.json file inside it. I then created a Sagemaker notebook and specified this S3 bucket in the IAM role.
This now works from inside the notebook:
import pandas as pd
from sagemaker import get_execution_role
bucket='my-sagemaker-bucket'
data_key = 'data.csv'
data_location = 's3://{}/{}'.format(bucket, data_key)
data = pd.read_csv(data_location)
But this errors saying file doesn't exist:
import json
from sagemaker import get_execution_role
bucket='my-sagemaker-bucket'
data_key = 'data.json'
data_location = 's3://{}/{}'.format(bucket, data_key)
data = json.load(open(data_location))
Anyone know why I can read the csv but not the json? I also can't shutil.copy the csv to the notebook's current working directory (also says file doesn't exist). I'm not very well versed with S3 buckets or Sagemaker, so not sure if this is a permissions/policy issue or something else.
your SageMaker-ExecutionRole might have insufficient rights to access your S3-bucket. The default IAM-SageMaker Execution role has the permission: "AmazonSageMakerFullAccess" which uses the S3 RequestCondition "s3:ExistingObjectTag/SageMaker = true".
So maybe you could try to simply tag your S3 bucket (Tag: SageMaker:true). Control your IAM settings.
import pandas as pd
bucket='my-sagemaker-bucket'
data_key = 'data.json'
data_location = 's3://{}/{}'.format(bucket, data_key)
pd.read_json(data_location) # , orient='columns', typ='series'
Pandas can handle S3 URL using your AWS credentials. So you could use pd.read_csv or pd.read_json instead of json.load. The suggestion from #Michael_S should work.
I am using AWS Sagemaker and trying to upload a data folder into S3 from Sagemaker. I am trying to do is to upload my data into the s3_train_data directory (the directory exists in S3). However, it wouldn't upload it in that bucket, but in a default Bucket that has been created, and in turn creates a new folder directory with the S3_train_data variables.
code to input in directory
import os
import sagemaker
from sagemaker import get_execution_role
sagemaker_session = sagemaker.Session()
role = get_execution_role()
bucket = <bucket name>
prefix = <folders1/folders2>
key = <input>
s3_train_data = 's3://{}/{}/{}/'.format(bucket, prefix, key)
#path 'data' is the folder in the Jupyter Instance, contains all the training data
inputs = sagemaker_session.upload_data(path= 'data', key_prefix= s3_train_data)
Is the problem in the code or more in how I created the notebook?
You could look at the Sample notebooks, how to upload the data S3 bucket
There have many ways. I am just giving you hints to answer.
And you forgot create a boto3 session to access the S3 bucket
It is one of the ways to do it.
import os
import urllib.request
import boto3
def download(url):
filename = url.split("/")[-1]
if not os.path.exists(filename):
urllib.request.urlretrieve(url, filename)
def upload_to_s3(channel, file):
s3 = boto3.resource('s3')
data = open(file, "rb")
key = channel + '/' + file
s3.Bucket(bucket).put_object(Key=key, Body=data)
# caltech-256
download('http://data.mxnet.io/data/caltech-256/caltech-256-60-train.rec')
upload_to_s3('train', 'caltech-256-60-train.rec')
download('http://data.mxnet.io/data/caltech-256/caltech-256-60-val.rec')
upload_to_s3('validation', 'caltech-256-60-val.rec')
link : https://buildcustom.notebook.us-east-2.sagemaker.aws/notebooks/sample-notebooks/introduction_to_amazon_algorithms/imageclassification_caltech/Image-classification-fulltraining.ipynb
Another way to do it.
bucket = '<your_s3_bucket_name_here>'# enter your s3 bucket where you will copy data and model artifacts
prefix = 'sagemaker/breast_cancer_prediction' # place to upload training files within the bucket
# do some processing then prepare to push the data.
f = io.BytesIO()
smac.write_numpy_to_dense_tensor(f, train_X.astype('float32'), train_y.astype('float32'))
f.seek(0)
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train', train_file)).upload_fileobj(f)
Link : https://buildcustom.notebook.us-east-2.sagemaker.aws/notebooks/sample-notebooks/introduction_to_applying_machine_learning/breast_cancer_prediction/Breast%20Cancer%20Prediction.ipynb
Youtube link : https://www.youtube.com/watch?v=-YiHPIGyFGo - how to pull the data in S3 bucket.