scrapy store images to amazon s3

scrapy store images to amazon s3 - python

I store images in my local server then upload to s3
Now I want to edit it to stored images directly to amazon s3
But ther is error:
boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden
here is my settings.py
AWS_ACCESS_KEY_ID = "XXXX"
AWS_SECRET_ACCESS_KEY = "XXXX"
IMAGES_STORE = 's3://how.are.you/'
Do I need to add something??
my scrapy edition: Scrapy==0.22.2
Please guide me,thank you!

AWS_ACCESS_KEY_ID = "xxxxxx"
AWS_SECRET_ACCESS_KEY = "xxxxxx"
IMAGES_STORE = "s3://bucketname/virtual_path/"
how.are.you should be a S3 Bucket that exist into your S3 account, and it will store the images you upload. If you want to store images inside any virtual_path then you need to create this folder into your S3 Bucket.

I found the cause of the problem is upload policy. The function Key.set_contents_from_string() takes argument policy, default set to S3FileStore.POLICY. So modify the code in scrapy/contrib/pipeline/files.py, change
return threads.deferToThread(k.set_contents_from_string, buf.getvalue(),
headers=h, policy=self.POLICY)
to
return threads.deferToThread(k.set_contents_from_string, buf.getvalue(),
headers=h)
Maybe you can try it, and share the result here.

I think the problem is not in your code, actually the problem lies in permission, please check your credentials first and make sure your permissions to access and write on s3 bucket.
import boto
s3 = boto.connect_s3('access_key', 'secret_key')
bucket = s3.lookup('bucket_name')
key = bucket.new_key('testkey')
key.set_contents_from_string('This is a test')
key.delete()
If test run successfuly then look into your permission, for setting permission you can look at amazon configuration

Related

Python boto3 checking for valid bucket within region

I have seen examples for checking whether an S3 bucket exists and have implemented them below. My bucket is located in us-east-1 region but the following code doesn't throw an exception. Is there a way to make the check region specific depending on my session?
session = boto3.Session(
profile_name = 'TEST'
,region_name='ap-south-1'
)
s3 = session.resource('s3')
bucket_name = 'TEST_BUCKET'
try:
s3.meta.client.head_bucket(Bucket = bucket_name)
except ClientError as c:
print(c)

It does not matter which S3 regional endpoint you send the request to. The underlying SDK (boto3) will redirect as needed. It's preferable, however, to target the correct region if you know it in advance, to save on redirects.
You can see this in detail if you use the awscli in debug mode:
aws s3api head-bucket --bucket mybucket --region ap-south-1 --debug
You will see debug output similar to this:
DEBUG - S3 client configured for region ap-south-1 but the bucket mybucket is in region us-east-1; Please configure the proper region to avoid multiple unnecessary redirects and signing attempts.
DEBUG - Switching signature version for service s3 to version s3v4 based on config file override.
DEBUG - Updating URI from https://s3.ap-south-1.amazonaws.com/mybucket to https://s3.us-east-1.amazonaws.com/mybucket
Note that the awscli uses the boto3 SDK, as does your Python script.

boto3 ec2 & django [duplicate]

On boto I used to specify my credentials when connecting to S3 in such a way:
import boto
from boto.s3.connection import Key, S3Connection
S3 = S3Connection( settings.AWS_SERVER_PUBLIC_KEY, settings.AWS_SERVER_SECRET_KEY )
I could then use S3 to perform my operations (in my case deleting an object from a bucket).
With boto3 all the examples I found are such:
import boto3
S3 = boto3.resource( 's3' )
S3.Object( bucket_name, key_name ).delete()
I couldn't specify my credentials and thus all attempts fail with InvalidAccessKeyId error.
How can I specify credentials with boto3?

You can create a session:
import boto3
session = boto3.Session(
aws_access_key_id=settings.AWS_SERVER_PUBLIC_KEY,
aws_secret_access_key=settings.AWS_SERVER_SECRET_KEY,
)
Then use that session to get an S3 resource:
s3 = session.resource('s3')

You can get a client with new session directly like below.
s3_client = boto3.client('s3',
aws_access_key_id=settings.AWS_SERVER_PUBLIC_KEY,
aws_secret_access_key=settings.AWS_SERVER_SECRET_KEY,
region_name=REGION_NAME
)

This is older but placing this here for my reference too. boto3.resource is just implementing the default Session, you can pass through boto3.resource session details.
Help on function resource in module boto3:
resource(*args, **kwargs)
Create a resource service client by name using the default session.
See :py:meth:`boto3.session.Session.resource`.
https://github.com/boto/boto3/blob/86392b5ca26da57ce6a776365a52d3cab8487d60/boto3/session.py#L265
you can see that it just takes the same arguments as Boto3.Session
import boto3
S3 = boto3.resource('s3', region_name='us-west-2', aws_access_key_id=settings.AWS_SERVER_PUBLIC_KEY, aws_secret_access_key=settings.AWS_SERVER_SECRET_KEY)
S3.Object( bucket_name, key_name ).delete()

I'd like expand on #JustAGuy's answer. The method I prefer is to use AWS CLI to create a config file. The reason is, with the config file, the CLI or the SDK will automatically look for credentials in the ~/.aws folder. And the good thing is that AWS CLI is written in python.
You can get cli from pypi if you don't have it already. Here are the steps to get cli set up from terminal
$> pip install awscli #can add user flag
$> aws configure
AWS Access Key ID [****************ABCD]:[enter your key here]
AWS Secret Access Key [****************xyz]:[enter your secret key here]
Default region name [us-west-2]:[enter your region here]
Default output format [None]:
After this you can access boto and any of the api without having to specify keys (unless you want to use a different credentials).

If you rely on your .aws/credentials to store id and key for a user, it will be picked up automatically.
For instance
session = boto3.Session(profile_name='dev')
s3 = session.resource('s3')
This will pick up the dev profile (user) if your credentials file contains the following:
[dev]
aws_access_key_id = AAABBBCCCDDDEEEFFFGG
aws_secret_access_key = FooFooFoo
region=op-southeast-2

There are numerous ways to store credentials while still using boto3.resource().
I'm using the AWS CLI method myself. It works perfectly.
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html?fbclid=IwAR2LlrS4O2gYH6xAF4QDVIH2Q2tzfF_VZ6loM3XfXsPAOR4qA-pX_qAILys

you can set default aws env variables for secret and access keys - that way you dont need to change default client creation code - though it is better to pass it as a parameter if you have non-default creds

Upload image to S3 python

I am trying to upload an image to S3 through Python. My code looks like this:
import os
from PIL import Image
import boto
from boto.s3.key import Key
def upload_to_s3(aws_access_key_id, aws_secret_access_key, file, bucket, key, callback=None, md5=None, reduced_redundancy=False, content_type=None):
conn = boto.connect_s3(aws_access_key_id, aws_secret_access_key)
bucket = conn.get_bucket(bucket, validate=False)
k = Key(bucket)
k.key = key
k.set_contents_from_file(file)
AWS_ACCESS_KEY = "...."
AWS_ACCESS_SECRET_KEY = "....."
filename = "images/image_0.jpg"
file = Image.open(filename)
key = "image"
bucket = 'images'
upload_to_s3(AWS_ACCESS_KEY, AWS_ACCESS_SECRET_KEY, file, bucket, key)
I am getting this error message:
S3ResponseError: S3ResponseError: 400 Bad Request
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InvalidRequest</Code><Message> The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.</Message>
<RequestId>90593132BA5E6D6C</RequestId>
<HostId>...</HostId></Error>
This code is based on the tutorial from this website: http://stackabuse.com/example-upload-a-file-to-aws-s3/
I have tried k.set_contents_from_file as well as k.set_contents_from_filename, but both don't seem to work for me.
The error says something about using AWS4-HMAC-SHA256, but I am not sure how to do that. Is there another way to solve this problem besides using AWS4-HMAC-SHA256? If anyone can help me out, I would really appreciate it.
Thank you!

Just use:
import boto3
client = boto3.client('s3', region_name='us-west-2')
client.upload_file('images/image_0.jpg', 'mybucket', 'image_0.jpg')
Try to avoid putting your credentials in the code. Instead:
If you are running the code from an Amazon EC2 instance, simply assign an IAM Role to the instance with appropriate permissions. The credentials will automatically be used.
If you are running the code on your own computer, use the AWS Command-Line Interface (CLI) aws configure command to store your credentials in a file, which will be automatically used by your code.

Ceph radosgw - bucket policy - make all objects public-read by default

I work with a group of non-developers which are uploading objects to an s3 style bucket through radosgw. All uploaded objects need to be publicly available, but they cannot do this programmatically. Is there a way to make the default permission of an object public-read so this does not have to be manually set every time? There has to be a way to do this with boto, but I've yet to find any examples. There's a few floating around using AWS' GUI, but that is not an option for me. :(
I am creating a bucket like this:
#!/usr/bin/env python
import boto
import boto.s3.connection
access_key = "SAMPLE3N84XBEHSAMPLE"
secret_key = "SAMPLEc4F3kfvVqHjMAnsALY8BCQFwTkI3SAMPLE"
conn = boto.connect_s3(
aws_access_key_id = access_key,
aws_secret_access_key = secret_key,
host = '10.1.1.10',
is_secure=False,
calling_format = boto.s3.connection.OrdinaryCallingFormat(),
)
bucket = conn.create_bucket('public-bucket', policy='public-read')
I am setting the policy to public-read which seems to allow people to browse the bucket as a directory, but the objects within the bucket do not inherit this permission.
>>> print bucket.get_acl()
<Policy: http://acs.amazonaws.com/groups/global/AllUsers = READ, S3 Newbie (owner) = FULL_CONTROL>
To clarify, I do know I can resolve this on a per-object basis like this:
key = bucket.new_key('thefile.tgz')
key.set_contents_from_filename('/home/s3newbie/thefile.tgz')
key.set_canned_acl('public-read')
But my end users are not capable of doing this, so I need a way to make this the default permission of an uploaded file.

I found a solution to my problem.
First, many thanks to joshbean who posted this: https://github.com/awsdocs/aws-doc-sdk-examples/blob/master/python/example_code/s3/s3-python-example-put-bucket-policy.py
I noticed he was using the boto3 library, so I started using it for my connection.
import boto3
import json
access_key = "SAMPLE3N84XBEHSAMPLE"
secret_key = "SAMPLEc4F3kfvVqHjMAnsALY8BCQFwTkI3SAMPLE"
conn = boto3.client('s3', 'us-east-1',
endpoint_url="http://mycephinstance.net",
aws_access_key_id = access_key,
aws_secret_access_key = secret_key)
bucket = "public-bucket"
bucket_policy = {
"Version":"2012-10-17",
"Statement":[
{
"Sid":"AddPerm",
"Effect":"Allow",
"Principal": "*",
"Action":["s3:GetObject"],
"Resource":["arn:aws:s3:::{0}/*".format(bucket)]
}
]
}
bucket_policy = json.dumps(bucket_policy)
conn.put_bucket_policy(Bucket=bucket_name, Policy=bucket_policy)
Now when an object is uploaded in public-bucket, it can be anonymously downloaded without explicitly setting the key permission to public-read or generating a download URL.
If you're doing this, be REALLY REALLY certain that it's ok for ANYONE to download this stuff. Especially if your radosgw service is publicly accessible on the internet.

how to download a file from s3 bucket with a temporary token in python

I have a django web app and I want to allow it to download files from my s3 bucket.
The files are not public. I have an IAM policy to access them.
The problem is that I do NOT want to download the file on the django app server and then serve it to download on the client. That is like downloading twice. I want to be able to download directly on the client of the django app.
Also, I don't think it's safe to pass my IAM credentials in an http request so I think I need to use a temporary token.
I read:
http://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_use-resources.html
but I just do not understand how to generate a temporary token on the fly.
A python solution (maybe using boto) would be appreciated.

With Boto (2), it should be really easy to generate time-limited download URLs, should your IAM policy have the proper permissions. I am using this approach to serve videos to logged-in users from private S3 bucket.
from boto.s3.connection import S3Connection
conn = S3Connection('<aws access key>', '<aws secret key>')
bucket = conn.get_bucket('mybucket')
key = bucket.get_key('mykey', validate=False)
url = key.generate_url(86400)
This would generate a download URL for key foo in the given bucket, that is valid for 24 hours (86400 seconds). Without validate=False Boto 2 will check that the key actually exists in the bucket first, and if not, will throw an exception. With these server-controlled files it is often an unnecessary extra step, thus validate=False in the example
In Boto3 the API is quite different:
s3 = boto3.client('s3')
# Generate the URL to get 'key-name' from 'bucket-name'
url = s3.generate_presigned_url(
ClientMethod='get_object',
Params={
'Bucket': 'mybucket',
'Key': 'mykey'
},
expires=86400
)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

scrapy store images to amazon s3 - python

Related

Python boto3 checking for valid bucket within region

boto3 ec2 & django [duplicate]

Upload image to S3 python

Ceph radosgw - bucket policy - make all objects public-read by default

how to download a file from s3 bucket with a temporary token in python

Categories

Resources