I would like to know if there is a way to upload all the files contained in a folder to minIO, or if there is a method already implemented.
To upload one file is very simple, but I can't find a way to upload several files that are inside a directory in local
You can try this out
def upload_local_directory_to_minio(local_path: str, bucket_name: str):
assert os.path.isdir(local_path)
for local_file in glob.glob(local_path + '/**'):
local_file = local_file.replace(os.sep, "/")
if not os.path.isfile(local_file):
upload_local_directory_to_minio(
local_file, bucket_name)
else:
remote_path = os.path.join(
local_file[1 + len(local_path):])
remote_path = remote_path.replace(
os.sep, "/")
minioClient.fput_object(bucket_name, remote_path, local_file)
Related
I have a excel sheet with metadata with 3 fields (path,folder_structure,filename)
Path: it is the path of source file in s3 source bucket
folder_structure: new folder structure that need to be created in Target bucket
filename: this is the filename that need to be renamed after copying to target bucket
I have below code working in windows source folder and creating target folder and copying data to target folder. need to modify this to source from s3 bucket and load it another s3 bucket.
code:
import pandas as pd
import os,shutil
from pathlib import Path
data = pd.read_excel('c:\data\sample_requirement.xlsx',engine='openpyxl')
root_dir = 'source'
for rec in range(len(data)):
#Replacing the '|' symbol with backward slash
dire = data['folder_structure'][rec].replace('|','\\')
#appending root directory with folder structure
directory = root_dir+'\\'+dire
#print(directory)
#Checking if path exists, if exit-> skip else-> create new
if not os.path.exists(directory):
#print('Not exist')
#creating new directory
os.makedirs(directory)
#Path in the excel
path = data['path'][rec]
#Filenames to change
filename = data['filename'][rec]
#print(filename)
if not os.path.isfile(directory + filename) :
#Copying the files to created path
shutil.copy(path,directory)
#Renaming the files
try:
os.rename(directory + os.path.basename(path),directory + filename)
except FileExistsError as e:
print('File Name already Exists')
How about this just add this to your code replace your target and destination bucket:-
import boto3
s3 = boto3.resource('s3')
copy_source = {
'Bucket': 'yoursourcebucket',
'Key': 'yourkey'
}
s3.meta.client.copy(copy_source, 'nameofdestinationbucket', 'destinationkey')
its a good practice to follow docs for knowing the details of the code, also note there maybe many other ways too to perform the same operation for example using awscli https://stackoverflow.com/a/32526487/13126651
Copy one file from a bucket to another :
s3 = boto3.client("s3")
s3.copy({"Bucket": SOURCE_BUCKET, "Key": SOURCE_KEY}, DESTINATION_BUCKET, DESTINATION_KEY)
Copy up to 1000 files a bucket to another :
s3 = boto3.client("s3")
response = s3.list_objects_v2(
Bucket=self.bucket,
Prefix=self.path,
) # Warning not handling pagination -> will truncate after 1000 keys
for file in response['Contents']:
s3.copy(
{"Bucket": SOURCE_BUCKET, "Key": file['Key']},
DESTINATION_BUCKET,
"/".join(
DESTINATION_PREFIX,
file['KEY']
),
)
Copy more than 1000 files :
properly handle pagination when call list_objects_v2
loop until response['IsTruncated']
use response['NextContinuationToken'] as the new 'ContinuationToken' arg
I have a bucket in s3 called "sample-data". Inside the Bucket I have folders labelled "A" to "Z".
Inside each alphabetical folder there are more files and folders. What is the fastest way to download the alphabetical folder and all it's content?
For example --> sample-data/a/foo.txt,more_files/foo1.txt
In the above example the bucket sample-data contains an folder called a which contains foo.txt and a folder called more_files which contains foo1.txt
I know how to download a single file. For instance if i wanted foo.txt I would do the following.
s3 = boto3.client('s3')
s3.download_file("sample-data", "a/foo.txt", "foo.txt")
However i am wondering if i can download the folder called a and all it's contents entirely? Any help would be appreciated.
I think your best bet would be the awscli
aws s3 cp --recursive s3://mybucket/your_folder_named_a path/to/your/destination
From the docs:
--recursive (boolean) Command is performed on all files or objects under the specified directory or prefix.
EDIT:
To do this with boto3 try this:
import os
import errno
import boto3
client = boto3.client('s3')
def assert_dir_exists(path):
try:
os.makedirs(path)
except OSError as e:
if e.errno != errno.EEXIST:
raise
def download_dir(bucket, path, target):
# Handle missing / at end of prefix
if not path.endswith('/'):
path += '/'
paginator = client.get_paginator('list_objects_v2')
for result in paginator.paginate(Bucket=bucket, Prefix=path):
# Download each file individually
for key in result['Contents']:
# Calculate relative path
rel_path = key['Key'][len(path):]
# Skip paths ending in /
if not key['Key'].endswith('/'):
local_file_path = os.path.join(target, rel_path)
# Make sure directories exist
local_file_dir = os.path.dirname(local_file_path)
assert_dir_exists(local_file_dir)
client.download_file(bucket, key['Key'], local_file_path)
download_dir('your_bucket', 'your_folder', 'destination')
You list all the objects in the folder you want to download. Then iterate file by file and download it.
import boto3
s3 = boto3.client("s3")
response = s3.list_objects_v2(
Bucket=BUCKET,
Prefix ='DIR1/DIR2',
)
The response is of type dict. The key that contains the list of the file names is "Contents"
Here are more information:
list all files in a bucket
boto3 documentation
I am not sure if this is the fastest solution, but it can help you.
Here is how I do to generate ZIP and download it from a server, it works well in local development.
import zipfile
doc = get_object_or_404(Document,id=id_obj)
filepath = doc.file.path
filename = os.path.basename(doc.file.name)
directory = os.path.dirname(filepath)
xzip = zipfile.ZipFile(os.path.join(directory,"%s.zip" % filename), "w")
xzip.write(filepath,filename)
xzip.close()
zip_file = open(xzip.filename, 'rb')
response = HttpResponse(zip_file, content_type='application/zip')
response['Content-Disposition'] = 'attachment; filename="%s.zip"' %
os.path.splitext(filename)[0]
return response
All my static & media files are uploaded to AWS in production. So I change a little bit
# filepath becomes
filepath = settings.MEDIA_ROOT + "/" + doc.file.name
But When I try to download it, it gives me [Errno 2] No such file or directorywith the link:
https://bucket_name.s3.amazonaws.com/media/public/files/file.pdf.zip
the settings.MEDIA_ROOT is:
AWS_ACCESS_KEY_ID = config('AWS_ACCESS_KEY_ID')
AWS_SECRET_ACCESS_KEY = config('AWS_SECRET_ACCESS_KEY')
AWS_STORAGE_BUCKET_NAME = 'bucket_name'
AWS_S3_CUSTOM_DOMAIN = '%s.s3.amazonaws.com' % AWS_STORAGE_BUCKET_NAME
AWS_PUBLIC_MEDIA_LOCATION = 'media/public'
MEDIA_ROOT = "https://%s/%s/" % (AWS_S3_CUSTOM_DOMAIN, AWS_PUBLIC_MEDIA_LOCATION)
doc.file.path gives me the error: 'This backend doesn't support absolute paths', that's why I changed to MEDIA_ROOT + doc.file.name
How to do that to download from AWS the zip file generated?
The file exists on S3, not the local file system. When you call those os.path.* functions the code is trying to find the file on the local file system. It's giving you that error because that S3 URL you are giving it as the path can't be mapped to anything on the local file system.
Why don't you allow S3 to serve the file directly to the end-user's browser by simply return a redirect response with the URL of the S3 file instead of trying to read the file and return the contents in the response?
I have a ftp client developed with python. When I specify a file in the current directory, it is successfully uploading. I want to specify a different directory except the current directory. How could I modify this code?
from ftplib import FTP
ftp = FTP('')
ftp.connect("127.0.0.1", 1026)
ftp.login()
ftp.retrlines('LIST')
def uploadFile():
filename = "f.txt" #replace with your file in your home folder
ftp.storbinary('STOR '+filename, open(filename, 'rb'))
print(ftp.storbinary)
ftp.quit()
print("filename",filename,"uploaded to server")
uploadFile()
Here I want to specify this directory to select files C:\Users\User\Desktop\nnn.
Please any help would be highly appreciated.
Put the directory prefix in the path when calling open():
ftp.storbinary('STOR ' + filename, open(os.path.join(r'C:\Users\User\Desktop\nnn', filename), 'rb'))
You can set filename like this way
ftp.storbinary('STOR {0}.mrss'.format("Your file name"), file)
I'm trying to upload files from my local system to the GCS using boto in cloud. After a file get uploaded I get an error which says " The MD5 you specified in Content-MD5 or x-goog-hash did not match what we computed." Below is my code.
def upload():
bucket_name = 'bucketname'
bucket = conn.get_bucket(bucket_name)
fpic = Key(bucket)
d='E:/Eclipse/workspace/Files'
for filename in os.listdir(d):
contents=d + '/' + filename
fpic.key = 'my-files'+filename
fpic.set_contents_from_filename(contents, {}, replace = True)
There is also another way to upload the files from local to GCS using boto. Find the link below [1]. Try that it will work for you without any error.
[1] https://cloud.google.com/storage/docs/gspythonlibrary#credentials