how to unzip the file from s3 using EC2? - python

I have 150 GB file in s3,I would like to unzip and upload the each file back to s3.what is the best approach to do with python and EC2 ? I appreciate your response.

Download it on your system
Unzip the folder and it will create a normal folder say "Unzipped_Folder"
Assuming you are using Windows, install aws-cli in that.
Create an IAM User with S3 write access to that bucket and create a Access and Secret Access Key.
In aws-cli add the credentials from command prompt.
$ aws configure
Now run the following command to send files to S3 bucket
$ aws s3 cp your_directory_path s3://your_bucket_name --recursive

Related

How to pass --no-verify-ssl to AWS CLI from Python?

I am new to get AWS CLI working, and finally have my commands working through gitBash with:
aws s3 ls --no-verify-ssl
I am now trying to run the same commands from Python.
I need to be able to do the following tasks in AWS s3 from Python:
Copy hundreds of local folders to the s3 bucket.
Update existing folders on the s3 bucket with changes made on local versions.
List contents of the s3 bucket.
In reading similar posts here, I see that --no-verify-ssl means there is a bigger problem, however using it is the way our network people have set things up, and I have no control over that. This is the flag they require to be used to allow access to the AWS CLI.
I have tried using boto3 and running the Python command there, but I get an authentication error because I don't know how to pass the --no-verify-ssl flag from Python.

How can I download encrypted .gz file from AWS S3 bucket

I backup a nextcloud instance, compress and encrypt it and the I store it in an S3 bucket. When I enter the bucket I have a file called backup.tar.gz. Great.
Now I download the file form the bucket and I get a .tar file, which obviously renders my backup completely useless. I can't decrypt it. If I manually save the file with the extension .gz it still doesn't decrypt but says tar: Error opening archive: Unrecognized archive format.
When I try to download it via the CLI it doesn't download the file but says usage: aws [options] <command> <subcommand> [<subcommand> ...] [parameters]:
aws s3 cp s3://mybucket/myfile.tar.gz
The problem is mentioned also here: https://forums.aws.amazon.com/thread.jspa?threadID=250926 where the only answer is, "why not use AWS snapshot to backup your EC2 instance".
I don't get why AWS would do that or is this a bug? Is there any decent way to get my backup from my bucket in the format I specify?
I could also do this, if necessary with a python script but IDK if that would work any better?
Help is very much appreciated! Thanks in advance!
I encrypt like this:
tar -czf - /path/to/file | openssl enc -e -aes256 -pbkdf2 -salt -out backup.tar.gz -pass pass:$ENCRYPTION_PASS
You can download a file from S3 using the awscli as follows:
aws s3 cp s3://mybucket/myfile.tar.gz .
Note the trailing period which means 'to the current folder'.
Or you could rename the file during download, as follows:
aws s3 cp s3://mybucket/myfile.tar.gz xyz.tar.gz
It's possible, I suppose, that if you download a tar.gz using a browser or other tool (not the awscli) then that tool might try to be helpful and uncompress the GZ file for you in flight, resulting in a TAR file. But the awscli won't do that natively.

Run Python Script on AWS and transfer 5GB of files to EC2

I am an absolute beginner in AWS: I have created a key and an instance, the python script I want to run in the EC2 environment needs to loop through around 80,000 filings, tokenize the sentences in them, and use these sentences for some unsupervised learning.
This might be a duplicate; but I can't find a way to copy these filings to the EC2 environment and run the python script in EC2, I am also not very sure as to how I can use boto3. I am using Mac OS. I am just looking for any way to speed things up. Thank you so so much! I am forever grateful!!!
Here's what I tried recently:
Create the bucket and keep the bucket accessible for public.
Create the role and add HTTP option.
Upload all the files and make sure the files are public accessible.
Get the HTTP link of the S3 file.
Connect the instance through putty.
wget copies the file into EC2
instance.
If your files are in zip format, one time copy enough to move all the files into instance.
Here's one way that might help:
create a simple IAM role that allows S3 access to the bucket holding your files
apply that IAM role to the running EC2 instance (or launch a new instance with the IAM role)
install the awscli on the EC2 instance
SSH to the instance and sync the S3 files to the EC2 instance using aws s3 sync
run your app
I'm assuming you've launched EC2 with enough diskspace to hold the files.

How to run python file in AWS S3 bucket from EC2?

I tried run python file from AWS S3 storage like
python s3://test-bucket/test/py_s3_test.py
I'm getting Error :
python: can't open file 's3://test-bucket/test/py_s3_test.py': [Errno 2] No such file or directory
Is there anyway to run python file resides in AWS S3.
Thank you.
Try this one, it will work.
aws s3 cp s3://yourbucket/path/to/file/hello.py - | python
Explanation: Its downloading the file from S3 and then passing stream to python for execution.
Alternatively, you could split it into multiple steps as well like download the file, save it to any local file and execute the locally saved file.
Hope it helps!

how to download S3 bucket files using Python

My requirement is to download files from S3 bucket on daily basis based on the date filter (ex: day=2018-07-14). We are successfully able to download using AWSCLI using the below code
aws s3 cp s3://<bucketname>/day=2018-07-14 local_dir --recursive
But I would want to download using Python script (may be boto3). Can anyone suggest what are the steps to be taken and mainly the configuration steps (I am using the windows machine) to download using .py scripts.
Thanks in advance.
import boto3
This will unlock the python functionality you desire:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html

Categories