I am using boto client to download and upload my files to s3 and do a whole bunch of other things like copy from one folder key to another and etc. The problem arises when I try to copy a key whose size is 0 bytes. The code that I use to copy is below
# Get the connection to the bucket
conn = boto.connect_s3(AWS_KEY, SECRET_KEY)
bucket = conn.get_bucket('mybucket')
# bucket.name is the name of my bucket
# candidate is the source key
destination_key = "destination/path/on/s3"
candidate = "the/file/to/copy"
# now copy the key
bucket.copy_key(destination_key, bucket.name, candidate) # --> This throws an exception
# just in case, see if the key ended up in the destination.
copied_key = bucket.lookup(destination_key)
The exception that I get is
3ResponseError: 404 Not Found
<Error><Code>NoSuchKey</Code>
<Message>The specified key does not exist.</Message>
<Key>the/file/to/copy</Key><RequestId>ABC123</RequestId><HostId>XYZ123</HostId>
</Error>
Now I have verified that the key infact exists by logging into the aws console and navigating to the source key location, the key is there and the aws console shows that its size is 0 (there are cases in my application that I may end up with empty files but I need them on s3).
So upload works fine, boto uploads the key without any issue, but when I attempt to copy it, then I get the error that the key does not exist
So is there any other logic that I should be using to copy such keys? Any help in this regard would be appreciated
Make sure you include the bucket of the source key. Should be something like bucket/path/to/file/to/copy
Try this:
from boto.s3.key import Key
download_path = '/tmp/dest_test.jpg'
bucket_key = Key(bucket)
bucket_key.key = file_key # e.g. images/source_test.jpg
bucket_key.get_contents_to_filename(download_path)
Related
When I try to print the load balancers from aws I get a huge dictionary with a lot of keys, but when I'm trying to print only the 'LoadBalancerName' value I get: None, I want to print all the load balancers names in our environment how I can do it? thanks!
What I tried:
import boto3
client = boto3.client('elbv2')
elb = client.describe_load_balancers()
Name = elb.get('LoadBalancerName')
print(Name)
The way in which you were handling the response object was incorrect, and you'll need to put it in a loop if you want all the Names and not just one. What you'll you'll need is this :
import boto3
client = boto3.client('elbv2')
elb = client.describe_load_balancers()
for i in elb['LoadBalancers']:
print(i['LoadBalancerArn'])
print(i['LoadBalancerName'])
However if your still getting none as a value it would be worth double checking what region the load balancers are in as well as if you need to pass in the use of a profile too.
I'd like to copy data from an S3 directory to the Amazon ElasticSearch service. I've tried following the guide, but unfortunately the part I'm looking for is missing. I don't know how the lambda function itself should look like (and all the info about this in the guide is: "Place your application source code in the eslambda folder."). I'd like ES to autoindex the files.
Currently I'm trying
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = urllib.unquote_plus(record['s3']['object']['key'])
index_name = event.get('index_name', key.split('/')[0])
object = s3_client.Object(bucket, key)
data = object.get()['Body'].read()
helpers.bulk(es, data, chunk_size=100)
But I get like a massive error stating
elasticsearch.exceptions.RequestError: TransportError(400, u'action_request_validation_exception', u'Validation Failed: 1: index is missing;2: type is missing;3: index is missing;4: type is missing;5: index is missing;6: type is missing;7: ...
Could anyone explain to me, how can I set things up so that my data gets moved from S3 to ES where it gets auto-mapped and auto-indexed? Apparently it's possible, as mentioned in the reference here and here.
While mapping can automatically be assigned in Elasticsearch, the indexes are not automatically generated. You have to specify the index name and type in the POST request. If that index does not exist, then Elasticsearch will create the index automatically.
Based on your error, it looks like you're not passing through an index and type.
For example, here's how a simple POST request to add a record to the index MyIndex and type MyType which would first create the index and type if it did not already exist.
curl -XPOST 'example.com:9200/MyIndex/MyType/' \
-d '{"name":"john", "tags" : ["red", "blue"]}'
I wrote a script to download a csv file from S3 and then transfer the data to ES.
Made an S3 client using boto3 and downloaded the file from S3
Made an ES client to connect to Elasticsearch.
Opened the csv file and used the helpers module from elasticsearch to insert csv file contents into elastic search.
main.py
import boto3
from elasticsearch import helpers, Elasticsearch
import csv
import os
from config import *
#S3
Downloaded_Filename=os.path.basename(Prefix)
s3 = boto3.client('s3', aws_access_key_id=awsaccesskey,aws_secret_access_key=awssecretkey,region_name=awsregion)
s3.download_file(Bucket,Prefix,Downloaded_Filename)
#ES
ES_index = Downloaded_Filename.split(".")[0]
ES_client = Elasticsearch([ES_host],http_auth=(ES_user, ES_password),port=ES_port)
#S3 to ES
with open(Downloaded_Filename) as f:
reader = csv.DictReader(f)
helpers.bulk(ES_client, reader, index=ES_index, doc_type='my-type')
config.py
awsaccesskey = ""
awssecretkey = ""
awsregion = "us-east-1"
Bucket=""
Prefix=''
ES_host = "localhost"
ES_port = "9200"
ES_user = "elastic"
ES_password = "changeme"
I have a Python lambda script that shrinks images as they are uploaded to S3. When the uploaded filename contains non-ASCII characters (Hebrew in my case), I cannot get the object (Forbidden as if the file doesn't exist).
Here's (some of) my code:
s3_client = boto3.client('s3')
def handler(event, context):
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
s3_client.download_file(bucket, key, "/tmp/somefile")
This raises An error occurred (403) when calling the HeadObject operation: Forbidden: ClientError. I also see in the log that the key contains characters like %D7%92.
Following the web I also tried to unquote the key according to some sources (http://blog.rackspace.com/the-devnull-s3-bucket-hacking-with-aws-lambda-and-python/) like so, with no luck:
key = urllib.unquote_plus(record['s3']['object']['key'])
Same error, although this time the log states that I'm trying to retrieve a key with characters like this: פ×קס×.
Note that this script is verified to work on English keys, and the tests were done on keys with no spaces.
#This worked for me
import urllib.parse
encodedStr = 'My+name+is+Tarak'
urllib.parse.unquote_plus(encodedStr)
"My name is Tarak"
I had a similar problem. I solved it adding an encode before doing the unquote:
key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key'].encode("utf8"))
Here's some sample code of copying an S3 key. There's a lot of reasons you might want to do this, one of which is to update key metadata. And while this does seem to be the widely accepted solution for that, there is a big issue. The problem is when I do the example below, I actually lose my Content-Type, which defaults back to 'application/octet-stream' (not very useful if trying to serve web images).
# Get bucket
conn = S3Connection(self._aws_key, self._aws_secret)
bucket = conn.get_bucket(self._aws_bucket)
# Create key
k = Key(bucket)
k.key = key
# Copy old key
k.metadata.update({ meta_key: meta_value })
k2 = k.copy(k.bucket.name, k.name, k.metadata, preserve_acl=True)
k = k2
Any ideas? Thanks.
The following GitHub Gist worked for me:
import boto
s3 = boto.connect_s3()
bucket = s3.lookup('mybucket')
key = bucket.lookup('mykey')
# Copy the key onto itself, preserving the ACL but changing the content-type
key.copy(key.bucket, key.name, preserve_acl=True, metadata={'Content-Type': 'text/plain'})
key = bucket.lookup('mykey')
print key.content_type
Took a looong time to run though!
take a look at this post
you need to do a
key = bucket.get_key(key.name)
then:
metadata['Content-Type'] = key.content_type will work
otherwise, key.content_type will return application/octet-stream
When I try to delete a bucket using the lines:
conn = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
print conn.delete_Bucket('BucketNameHere').message
It tells me the bucket I tried to delete is not empty.
The bucket has no keys in it. But it does have versions.
How can I delete the versions?
I can see the list of versions using bucket.list_versions()
Java has a deleteVersion Method on its s3 connection. I found that code here:
http://bytecoded.blogspot.com/2011/01/recursive-delete-utility-for-version.html
He does this line to delete the version:
s3.deleteVersion(new DeleteVersionRequest(bucketName, keyName, versionId));
Is there anything comparable in boto?
Boto does support versioned buckets after version 1.9c. Here's how it works:
import boto
s3 = boto.connect_s3()
#Create a versioned bucket
bucket = s3.create_bucket("versioned.example.com")
bucket.configure_versioning(True)
#Create a new key and make a few versions
key = bucket.new_key("versioned_object")
key.set_contents_from_string("Version 1")
key.set_contents_from_string("Version 2")
#Try to delete bucket
bucket.delete() ## FAILS with 409 Conflict
#Delete our key then try to delete our bucket again
bucket.delete_key("versioned_object")
bucket.delete() ## STILL FAILS with 409 Conflict
#Let's see what's in there
list(bucket.list()) ## Returns empty list []
#What's in there including versions?
list(bucket.list_versions()) ## Returns list of keys and delete markers
#This time delete all versions including delete markers
for version in bucket.list_versions():
#NOTE we're still using bucket.delete, we're just adding the version_id parameter
bucket.delete_key(version.name, version_id = version.version_id)
#Now what's in there
list(bucket.list_versions()) ## Returns empty list []
#Ok, now delete the bucket
bucket.delete() ## SUCCESS!!