Changing metadata when uploading file to s3 with python - python

I have an html file that I am uploading to s3 using python. For some reason, s3 adds a system defined metadata saying that the file Content-Type is "binary/octet-stream":
I need to change this value to "text/html". I can do it manually, but I want it to be done automatically when I upload the file.
I tried the following code:
metadata = {
"Content-Type": "text/html"
}
s3_file_key = str(Path("index.html"))
local_file_path = Path("~", "index.html").expanduser()
s3 = session.resource("s3")
bucket = s3.Bucket(bucket_name)
with open(local_file_path, READ_BINARY) as local_file:
bucket.put_object(Key=s3_file_key, Body=local_file, Metadata=metadata)
but the result was that the file had 2 metadata keys:
I couldn't find any documentation about how to change a system defined metadata.
Thanks for the help

Use MetadataDirective parameter:
bucket.put_object(Key=s3_file_key, Body=local_file, Metadata=metadata, MetadataDirective='REPLACE')
MetadataDirective -- Specifies whether the metadata is copied from the source object or replaced with metadata provided in the request ('COPY' | 'REPLACE').
S3 - Boto3 Docs

I was also having a similar problem and I found out you need to use the ContentType parameter for the put_object function.
bucket.put_object(Key=s3_file_key, Body=local_file, ContentType='text/html')

Related

To upload metadata for s3 image upload using python

I am uploading image using python boto3 to s3. Not able to set metadata field for s3 object to "image/png".
code:
s3.put_object(Bucket=settings.AWS_STORAGE_PRIVATE_BUCKET_NAME,
Key=s3_storage_path,
Body=file_content,
Metadata={"Content-Type":"image/png"},
)
Metadata gets set by system to "binary/octet-stream"
When using put_object, the content type is set via the ContentType parameter, not via Metadata.
ContentType (string) -- A standard MIME type describing the format of the contents. For more information, see http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.17 .
Try:
s3.put_object(
Bucket=settings.AWS_STORAGE_PRIVATE_BUCKET_NAME,
Key=s3_storage_path,
Body=file_content,
ContentType='image/png'
)

Downloading only AWS S3 object file names and image URL in CSV Format

I have hosted files in AWS s3 bucket, I need only all S3 bucket object URL's in CSV file
Please suggest
You can get all S3 Object URLS by using the AWS SDK for S3. First, what you need to do is read all items in a bucket. You can use Python code similar to this Java code (you can port the logic):
ListObjectsRequest listObjects = ListObjectsRequest
.builder()
.bucket(bucketName)
.build();
ListObjectsResponse res = s3.listObjects(listObjects);
List<S3Object> objects = res.contents();
for (ListIterator iterVals = objects.listIterator(); iterVals.hasNext(); ) {
S3Object myValue = (S3Object) iterVals.next();
System.out.print("\n The name of the key is " + myValue.key());
}
Then iterate through the list and get the key as shown above. For each object, you can get the URL using Python code similar to this:
GetUrlRequest request = GetUrlRequest.builder()
.bucket(bucketName)
.key(keyName)
.build();
URL url = s3.utilities().getUrl(request);
System.out.println("The URL for "+keyName +" is "+url.toString());
Put each URL value into a collection and then write the collection out to a CSV. That is how you achieve your use case.

Support for object level Tagging in boto3 upload_file method

I want to add tags to the files as I upload them to S3. Boto3 supports specifying tags with put_object method, however considering expected file size, I am using upload_file function which handles multipart uploads. But this function rejects 'Tagging' as keyword argument.
import boto3
client = boto3.client('s3', region_name='us-west-2')
client.upload_file('test.mp4', 'bucket_name', 'test.mp4',
ExtraArgs={'Tagging': 'type=test'})
ValueError: Invalid extra_args key 'Tagging', must be one of: ACL, CacheControl, ContentDisposition, ContentEncoding, ContentLanguage, ContentType, Expires, GrantFullControl, GrantRead, GrantReadACP, GrantWriteACP, Metadata, RequestPayer, ServerSideEncryption, StorageClass, SSECustomerAlgorithm, SSECustomerKey, SSECustomerKeyMD5, SSEKMSKeyId, WebsiteRedirectLocation
I found a way to make this work by using S3 transfer manager directly and modifying allowed keyword list.
from s3transfer import S3Transfer
import boto3
client = boto3.client('s3', region_name='us-west-2')
transfer = S3Transfer(client)
transfer.ALLOWED_UPLOAD_ARGS.append('Tagging')
transfer.upload_file('test.mp4', 'bucket_name', 'test.mp4',
extra_args={'Tagging': 'type=test'})
Even though this works, I don't think this is the best way. It might create other side effects. Currently I am not able to find correct way to achieve this. Any advice would be great. Thanks.
Tagging directive is now supported by boto3. You can do the following to add tags:
import boto3
from urllib import parse
s3 = boto3.client("s3")
tags = {"key1": "value1", "key2": "value2"}
s3.upload_file(
"file_path",
"bucket",
"key",
ExtraArgs={"Tagging": parse.urlencode(tags)},
)
The S3 Customization Reference — Boto 3 Docs documentation lists valid values for extra_args as:
ALLOWED_UPLOAD_ARGS = ['ACL', 'CacheControl', 'ContentDisposition', 'ContentEncoding', 'ContentLanguage', 'ContentType', 'Expires', 'GrantFullControl', 'GrantRead', 'GrantReadACP', 'GrantWriteACP', 'Metadata', 'RequestPayer', 'ServerSideEncryption', 'StorageClass', 'SSECustomerAlgorithm', 'SSECustomerKey', 'SSECustomerKeyMD5', 'SSEKMSKeyId', 'WebsiteRedirectLocation']
Therefore, this does not appear to be a valid way to specify a tag.
It appears that you might need to call put_object_tagging() to add the tag(s) after creating the object.

Amazon AWS - S3 to ElasticSearch (Python Lambda)

I'd like to copy data from an S3 directory to the Amazon ElasticSearch service. I've tried following the guide, but unfortunately the part I'm looking for is missing. I don't know how the lambda function itself should look like (and all the info about this in the guide is: "Place your application source code in the eslambda folder."). I'd like ES to autoindex the files.
Currently I'm trying
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = urllib.unquote_plus(record['s3']['object']['key'])
index_name = event.get('index_name', key.split('/')[0])
object = s3_client.Object(bucket, key)
data = object.get()['Body'].read()
helpers.bulk(es, data, chunk_size=100)
But I get like a massive error stating
elasticsearch.exceptions.RequestError: TransportError(400, u'action_request_validation_exception', u'Validation Failed: 1: index is missing;2: type is missing;3: index is missing;4: type is missing;5: index is missing;6: type is missing;7: ...
Could anyone explain to me, how can I set things up so that my data gets moved from S3 to ES where it gets auto-mapped and auto-indexed? Apparently it's possible, as mentioned in the reference here and here.
While mapping can automatically be assigned in Elasticsearch, the indexes are not automatically generated. You have to specify the index name and type in the POST request. If that index does not exist, then Elasticsearch will create the index automatically.
Based on your error, it looks like you're not passing through an index and type.
For example, here's how a simple POST request to add a record to the index MyIndex and type MyType which would first create the index and type if it did not already exist.
curl -XPOST 'example.com:9200/MyIndex/MyType/' \
-d '{"name":"john", "tags" : ["red", "blue"]}'
I wrote a script to download a csv file from S3 and then transfer the data to ES.
Made an S3 client using boto3 and downloaded the file from S3
Made an ES client to connect to Elasticsearch.
Opened the csv file and used the helpers module from elasticsearch to insert csv file contents into elastic search.
main.py
import boto3
from elasticsearch import helpers, Elasticsearch
import csv
import os
from config import *
#S3
Downloaded_Filename=os.path.basename(Prefix)
s3 = boto3.client('s3', aws_access_key_id=awsaccesskey,aws_secret_access_key=awssecretkey,region_name=awsregion)
s3.download_file(Bucket,Prefix,Downloaded_Filename)
#ES
ES_index = Downloaded_Filename.split(".")[0]
ES_client = Elasticsearch([ES_host],http_auth=(ES_user, ES_password),port=ES_port)
#S3 to ES
with open(Downloaded_Filename) as f:
reader = csv.DictReader(f)
helpers.bulk(ES_client, reader, index=ES_index, doc_type='my-type')
config.py
awsaccesskey = ""
awssecretkey = ""
awsregion = "us-east-1"
Bucket=""
Prefix=''
ES_host = "localhost"
ES_port = "9200"
ES_user = "elastic"
ES_password = "changeme"

Use put_internal to upload file using python eve

I want to store media files on certain mongo documents.
I was thinking of using eve's put_internal method call to update the document.
How would I use the payload param to provide the file as payload ?
You want to provide the file value as a FileStorage object. So suppose your media field is called media, a hypothetical payload would look like:
{'media': <FileStorage: u'example.jpg' ('image/jpeg')>, ...}
In order to achieve that you would do something like:
from werkzeug import FileStorage
f = open('example.jpg','r')
fs = FileStorage(f)
payload['media'] = fs

Categories