I backup a nextcloud instance, compress and encrypt it and the I store it in an S3 bucket. When I enter the bucket I have a file called backup.tar.gz. Great.
Now I download the file form the bucket and I get a .tar file, which obviously renders my backup completely useless. I can't decrypt it. If I manually save the file with the extension .gz it still doesn't decrypt but says tar: Error opening archive: Unrecognized archive format.
When I try to download it via the CLI it doesn't download the file but says usage: aws [options] <command> <subcommand> [<subcommand> ...] [parameters]:
aws s3 cp s3://mybucket/myfile.tar.gz
The problem is mentioned also here: https://forums.aws.amazon.com/thread.jspa?threadID=250926 where the only answer is, "why not use AWS snapshot to backup your EC2 instance".
I don't get why AWS would do that or is this a bug? Is there any decent way to get my backup from my bucket in the format I specify?
I could also do this, if necessary with a python script but IDK if that would work any better?
Help is very much appreciated! Thanks in advance!
I encrypt like this:
tar -czf - /path/to/file | openssl enc -e -aes256 -pbkdf2 -salt -out backup.tar.gz -pass pass:$ENCRYPTION_PASS
You can download a file from S3 using the awscli as follows:
aws s3 cp s3://mybucket/myfile.tar.gz .
Note the trailing period which means 'to the current folder'.
Or you could rename the file during download, as follows:
aws s3 cp s3://mybucket/myfile.tar.gz xyz.tar.gz
It's possible, I suppose, that if you download a tar.gz using a browser or other tool (not the awscli) then that tool might try to be helpful and uncompress the GZ file for you in flight, resulting in a TAR file. But the awscli won't do that natively.
Related
I have a google cloud storage bucket, I can download objects using the download_blob function in python, I can also use the gsutil cp or gcloud compute scp function to download entire directories.
Is there a way to download an entire directory from the storage bucket using python as a single zip file.
doing it this way Python - download entire directory from Google Cloud Storage requires me to download file by file.
Is there a way to download an entire directory at once ?
Cloud Storage has no concept of "directories" -- instead, each blob can be given a name that can resemble a directory path. Downloading entire "directories" is the same as downloading all blobs with the same prefix.
This means that you can use wildcards with gsutil:
gsutil -m cp gs://bucket/data/abc/* .
would copy every blob whose name starts with /data/abc/.
I have 150 GB file in s3,I would like to unzip and upload the each file back to s3.what is the best approach to do with python and EC2 ? I appreciate your response.
Download it on your system
Unzip the folder and it will create a normal folder say "Unzipped_Folder"
Assuming you are using Windows, install aws-cli in that.
Create an IAM User with S3 write access to that bucket and create a Access and Secret Access Key.
In aws-cli add the credentials from command prompt.
$ aws configure
Now run the following command to send files to S3 bucket
$ aws s3 cp your_directory_path s3://your_bucket_name --recursive
I am running a nextcloud instance and I am trying to download a directory with an API call using the library requests.
I can download a zip file using an API call. Now what I would like to to is to have an unzipped directory on my nextcloud instance and download it via an API call. It doesn't matter to me if I get a zip file back when I do the API call, I just want to have it unzipped in the cloud.
For instance, I can put an unzipped directory there and when I download it in my browser, nextcloud gives me back a zip file. This behaviour I want in an API call.
Now if I put a zipped file on there I can download the file like so:
import os
import requests
response = requests.request(
method="get",
url=f"https://mycloud/remote.php/dav/files/setup_user/{name_dir}/",
auth=("my_user", my_token),
)
if response.status_code == 200:
with open("/path/to/my/dir/file_name.zip"), "wb") as file:
file.write(response.content)
That writes me my zipfile which is in the cloud to a local file_name.zip file. My problem is now that if I have an unzipped directory in the cloud it doesn't work. Doesn't work meaning that I get a file back which has the content:
This is the WebDAV interface. It can only be accessed by WebDAV clients such as the Nextcloud desktop sync client.
I also tried to do this with wget wget --recursive --no-parent https://path/to/my/dir and I got the same file with the same content back.
So I assume that the WebDav API of nextcloud doesn't allow me to do it in the way I want to do it. Not I am wondering what I am doing wrong or if what I want is doable. Also I don't get why in the browser this works fine. I just select the unzipped folder and can download it with a click. In the nextcoud community it has been suggested to use Rclone (https://help.nextcloud.com/t/download-complete-directory-from-nextcloud-instance/77828), but I would prefer to not use a dependency that I have to set up on every machine where I want to run this code.
Any help is very much appreciated! Thanks a bunch in advance!
PS: In case anyone wonders why I would want to do this: It's way more convenient when I would like to change just a single file in my dir in the cloud. Otherwise I have to unzip, change file, zip and upload again.
I tried run python file from AWS S3 storage like
python s3://test-bucket/test/py_s3_test.py
I'm getting Error :
python: can't open file 's3://test-bucket/test/py_s3_test.py': [Errno 2] No such file or directory
Is there anyway to run python file resides in AWS S3.
Thank you.
Try this one, it will work.
aws s3 cp s3://yourbucket/path/to/file/hello.py - | python
Explanation: Its downloading the file from S3 and then passing stream to python for execution.
Alternatively, you could split it into multiple steps as well like download the file, save it to any local file and execute the locally saved file.
Hope it helps!
My requirement is to download files from S3 bucket on daily basis based on the date filter (ex: day=2018-07-14). We are successfully able to download using AWSCLI using the below code
aws s3 cp s3://<bucketname>/day=2018-07-14 local_dir --recursive
But I would want to download using Python script (may be boto3). Can anyone suggest what are the steps to be taken and mainly the configuration steps (I am using the windows machine) to download using .py scripts.
Thanks in advance.
import boto3
This will unlock the python functionality you desire:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html