merging gcs files and pasting to another gcs bucket using python - python

i am able to read the files from gcs bucket, but i am not able to merge the sample files and write it to another gcs location.
i have to merge all the files in one folder of gcs bucket and move that merged file to other gcs bucket location using python.
ex: gs://hello/test
test contains 5 files and the contnet of all 5 files to be merged and moved to another folder say test1.
i tried to move one file from one gcs to other, but the use case is merge all the files in one folder and move the merged file to other gcs bucket.

Related

Download all files in S3 bucket with any given number of subfolder levels with Boto3

I know there threads such as this that shows how to use Boto3 to download all files: either in a bucket, or a folder (i.e. key) directly under the bucket.
What would be a generic solution to download all files in a subfolder of any number of levels, as long as the S3 path is given. For example:
s3://bucket_name/level_1_folder/level_2_folder/.../level_N_folder/

S3 resource not listing key for some folders

I am trying to delete files (not folder) from multiple folders within an s3 bucket.
my code:
for archive in src.objects.filter(Prefix="Nightly/{folder}"):
s3.Object(BUCKET, archive.key).delete()
when I do this, this deletes only files from some directory (works fine)
but for other 2 folders it deletes the folder itself.
if you see the picture, I am listing the files in each folder.
folders account, user, Archive printing an extra archieve (highlighted)
but folders opportunity and opphistory not printing key for folder. I would like to know why this key is not printing for these 2 folders, thanks.
There are no folders and files in S3. Everything is an object. So Nightly/user/ is an object, just like Nightly/opportunity/opportunity1.txt is an object.
The "folders" are only visual representation made by AWS console:
The console uses the key name prefixes (Development/, Finance/, and Private/) and delimiter ('/') to present a folder structure.
So your "folders account, user, Archive printing an extra archive (highlighted)" are just objects called Nightly/user/, Nightly/account/ and Nightly/Archive/. Such objects are created when you click "New folder" in the AWS console (you can use also AWS SDK or CLI to create them). Your other "files" don't have such folders, because these "files" weren't created like this. Instead they where uploaded to S3 under their full name, e.g. Nightly/opportunity/opportunity1.txt.

How do i delete multiple selected Files only from a s3 bucket which has S3 url?

i have a excel sheet which contains s3 file name & s3 static url , i want to delete these objects at once . Total number of files are 1000+ .So any script i can use to delete these files will be helpful .
You could use the AWS Command-Line Interface (CLI) to delete the files, if you have AWS credentials with permission to delete the objects.
I would recommend that you add another column to the spreadsheet that inserts the file details into formula that would generate this:
aws s3 rm s3://BUCKET-NAME/path/file.txt
You could then copy the command from Excel and paste it into the command line to tell the AWS CLI to delete the file. Test it on a few first, to make sure it works.
Then, use Fill Down and copy/paste all the commands into the command line.

How can I copy only changed files from one S3 Bucket to another one with python using boto3?

I want to copy files from one S3 Bucket to another S3 Bucket every x minutes. But of course I only want to update the files if they have changed how can I achieve that with python using boto3?
I would recommend using Amazon S3 replication, which can automatically copy objects from one bucket to another.
You can select which objects to copy by specifying a path or a tag.
It's all automatic.

How to upload a zip file to s3 using boto?

I zip a folder having multiple subdirectories. When I upload it to s3 using boto
By reading like this,
zipdata = open(os.path.join(os.curdir, zip_file), 'rb').read()
Then all files from all subdirectries copied to root directory. That is no subdirectory exists at s3.
How to upload a zip file of a folder to s3?
After running the command you show above, zip_data will contain the bytes contained in the zip file. If you then write that data to S3, you will get a single object (key) in S3 that contains that data. Is that what you want?
It sounds like you want the zip file to be expanded and all of the individual files and directories inside it to be stored in S3 as individual objects. If that is the case, you need to expand the zip file locally and then walk through the hierarchy and store each individual file in S3. You could use the s3put command line tool in boto to do this for you.
There is no way to get S3 itself to unpack the contents of a zip file for you automatically.

Categories