Hi and thank you for reading.
I'm new GCP, and I still can't find a solution to my problem.
I've searched many topics but no solution helped me move on.
INPUT INFORMATION
I have files stored in my bucket in Cloud Storage.
These files could be of any extension, but I need to select only .zips
I want to write a python script in App-Engine, which will find and select these zip files then unzip them in the same directory in Cloud Storage
Below version of the script, which doesn't work
from google.cloud import storage
from zipfile import ZipFile
def list_blobs(bucket_name):
storage_client = storage.Client()
blobs = storage_client.list_blobs(bucket_name)
for blob in blobs:
try:
with ZipFile(f'{blob.name}', 'r') as zipObj:
zipObj.extractall()
except:
print(f'{blob.name} not supported unzipping')
list_blobs('test_bucket_for_me_05_08_2021')
Output
proxy.txt not supported unzipping
test.zip not supported unzipping
I find solution, this code below will unzipped zips in bucket
from google.cloud import storage
from zipfile import ZipFile
from zipfile import is_zipfile
import io
storage_client = storage.Client()
def unzip_files(bucketname):
bucket = storage_client.get_bucket(bucketname)
blobs = storage_client.list_blobs(bucketname)
for blob in blobs:
file = bucket.blob(blob.name)
try:
zipbytes = io.BytesIO(file.download_as_string())
if is_zipfile(zipbytes):
with ZipFile(zipbytes, 'r') as selected_zip:
for files_in_zip in selected_zip.namelist():
file_in_zip = selected_zip.read(files_in_zip)
blob_new = bucket.blob(files_in_zip)
blob_new.upload_from_string(file_in_zip)
except:
print(f'{blob.name} not supported')
unzip_files("test_bucket_for_me_05_08_2021")
Of course , i will modify this code, but this solution works
Thanks all for your time and effort
Related
I've wrote the following code to upload a file into blob storage using Python:
blob_service_client = ContainerClient(account_url="https://{}.blob.core.windows.net".format(ACCOUNT_NAME),
credential=ACCOUNT_KEY,
container_name=CONTAINER_NAME)
blob_service_client.upload_blob("my_file.txt", open("my_file.txt", "rb"))
this works fine. I wonder how can I upload the entire folder with all files and sub folders in it while keeping the structure of my local folder intact?
After reproducing from my end I could able to achieve your requirement using os module. Below is the complete code that worked for me.
dir_path = r'<YOUR_LOCAL_FOLDER>'
for path, subdirs, files in os.walk(dir_path):
for name in files:
fullPath=os.path.join(path, name)
print("FullPath : "+fullPath)
file=fullPath.replace(dir_path,'')
fileName=file[1:len(file)];
print("File Name :"+fileName)
# Create a blob client using the local file name as the name for the blob
blob_service_client = ContainerClient(account_url=ACCOUNT_URL,
credential=ACCOUNT_KEY,
container_name=CONTAINER_NAME)
print("\nUploading to Azure Storage as blob:\n\t" + fileName)
blob_service_client.upload_blob(fileName, open(fullPath, "rb"))
Below is the folder structure in my local.
├───g.txt
├───h.txt
├───Folder1
├───z.txt
├───y.txt
├───Folder2
├───a.txt
├───b.txt
├───SubFolder1
├───c.txt
├───d.txt
RESULTS:
I am unable to load a tar.gz file from my local directory to an S3 bucket location. I've had no issues running the function below to upload any csv files but am getting the error: "Fileobj must implement read" error. I am using Boto3 and Python
The tar_file is the file on my local drive to upload to the S3 bucket location
import csv
import glob
import os
import tarfile
from datetime import date
from typing import Optional, Set
from io import BytesIO
import psycopg2
import boto3
from constants import (
ARTIFACT_STORE,
DB_HOST,
DB_PASSWORD,
DB_USER,
EXCLUDED_TABLES,
NIPR_DB_NAME,
S3_ACCESS_KEY_ID,
S3_SECRET_ACCESS_KEY,
S3_ENDPOINT_URL,
BUCKET_NAME
)
def upload_s3_file():
tar_file = f"{ARTIFACT_STORE}/{date.today()}_cds.tar.gz"
s3 = boto3.client('s3',endpoint_url=S3_ENDPOINT_URL,aws_access_key_id=S3_ACCESS_KEY_ID,aws_secret_access_key=S3_SECRET_ACCESS_KEY)
with tarfile.open(tar_file,'r:gz') as tar:
s3.upload_fileobj(tar,BUCKET_NAME,tar_file)
When I run the below function on a csv generated file to the S3 bucket, I have no issues:
s3 = boto3.client('s3',endpoint_url=S3_ENDPOINT_URL,aws_access_key_id=ACCESS_KEY,aws_secret_access_key=SECRET_KEY)
with open("test.csv", "rb") as f:
s3.upload_fileobj(f,BUCKET_NAME, "test")
The problem is you're supposed to pass a file object to upload_fileobj and not a tarfile object.
with open(tar_file,'rb') as tar:
s3.upload_fileobj(tar,BUCKET_NAME,tar_file)
I have to upload a set of folders into a dedicated container in Azure Blob Storage.
I found this:
https://github.com/rahulbagal/upload-file-azure-sas-url
but it is just for uploading a file using a dedicated Blob SAS URI, and it works perfectly.
Is there any similar solution able to manage folder upload instead of a file upload?
Thank you in advanced
1. Please try to use this code:
import os
from azure.storage.blob import BlobServiceClient
account_url = "https://<storage-account-name>.blob.core.windows.net/"
sas_token = "<your-sas-token>"
blob_service_client = BlobServiceClient(account_url, sas_token)
container_name = "<your-container-name>"
container_client = blob_service_client.get_container_client(container_name)
local_path = "<your-folder-path>"
folder_name = "<your-folder-name>"
for files in os.listdir(local_path):
with open(os.path.join(local_path,files), "rb") as data:
blob_client = blob_service_client.get_blob_client(container=container_name, blob= folder_name + "/" + files)
blob_client.upload_blob(data)
2. Or you can use azcopy to upload your folder:
For example:
azcopy copy '<folder-path>' 'https://<account-name>.blob.core.windows.net/<container-name>?<sas-token>' --recursive
For more details, you can refer to this official documentation.
I'm just wondering is there a way to extract a password protected zip file from Azure Storage.
I tried using a python Azure Function to no avail but had a problem reading the location of the file.
Would the file have to stored on a shared location temporarily in order to achieve?
Just looking for a bit of direction here am I missing a step maybe?
Regards,
James
Azure blob storage provides storing functionality only, there is no running env to perform unzip operation. So basically, we should download .zip file to Azure function, unzip it and upload files in .zip file 1 by 1.
For a quick test, I write an HTTP trigger Azure function demo that unzipping a zip file with password-protected, it works for me on local :
import azure.functions as func
import uuid
import os
import shutil
from azure.storage.blob import ContainerClient
from zipfile import ZipFile
storageAccountConnstr = '<storage account conn str>'
container = '<container name>'
#define local temp path, on Azure, the path is recommanded under /home
tempPathRoot = 'd:/temp/'
unZipTempPathRoot = 'd:/unZipTemp/'
def main(req=func.HttpRequest) -> func.HttpResponse:
reqBody = req.get_json()
fileName = reqBody['fileName']
zipPass = reqBody['password']
container_client = ContainerClient.from_connection_string(storageAccountConnstr,container)
#download zip file
zipFilePath = tempPathRoot + fileName
with open(zipFilePath, "wb") as my_blob:
download_stream = container_client.get_blob_client(fileName).download_blob()
my_blob.write(download_stream.readall())
#unzip to temp folder
unZipTempPath = unZipTempPathRoot + str(uuid.uuid4())
with ZipFile(zipFilePath) as zf:
zf.extractall(path=unZipTempPath,pwd=bytes(zipPass,'utf8'))
#upload all files in temp folder
for root, dirs, files in os.walk(unZipTempPath):
for file in files:
filePath = os.path.join(root, file)
destBlobClient = container_client.get_blob_client(fileName + filePath.replace(unZipTempPath,''))
with open(filePath, "rb") as data:
destBlobClient.upload_blob(data,overwrite=True)
#remove all temp files
shutil.rmtree(unZipTempPath)
os.remove(zipFilePath)
return func.HttpResponse("done")
Files in my container:
Result:
Using blob triggers will be better to do this as it will cause time-out errors if the size of your zip file is huge.
Anyway, this is only a demo that shows you how to do this.
I am trying to set up cloud functions to move files between folders inside one bucket in GCP.
Whenever the user loads files into the provided bucket folder, my cloud functions move the file to another folder where big data scripts are looking after.
It shows successful while setting up, however, files are not moving from the source folders.
Appreciate your help
from google.cloud import storage
def move_file(bucket_name, bucket_Folder, blob_name):
"""Moves a blob from one folder to another with the same name."""
bucket_name = 'bucketname'
blob_name = 'filename'
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
source_blob = bucket.blob("Folder1/" + blob_name)
new_blob = bucket.copy_blob(source_blob, bucket, "Folder2/" + blob_name)
blob.delete()
print('Blob {} in bucket {} copied to blob {} .'.format(source_blob.name, bucket.name, new_blob.name))
From the code you provided, the variable blob is not defined anywhere, so the source file won't be deleted. Instead of blob.delete(), change that line to source_blob.delete().
Also, I assume you are aware that you're "moving" just a single file. If you want to move all files prefixed with Folder1/ to Folder2 you could do something like this instead:
from google.cloud import storage
def move_files(self):
storage_client = storage.Client()
bucket = storage_client.get_bucket('bucketname')
blobs = bucket.list_blobs(prefix='Folder1/')
for blob in blobs:
bucket.rename_blob(blob, new_name=blob.name.replace('Folder1/', 'Folder2/'))
For the latter, I reckon that there could be more efficient or better ways to do it.
If you are just moving the object inside of the same bucket you can just rename the object with the desired route.
In Google Cloud Platform Storage there are no folders, just the illusion of them. Everything after the name of the bucket is part of the name of the object.
Also, I can see many errors in your function. You can use this generic function to move a blob from one folder to another inside of the same bucket:
from google.cloud import storage
def rename_blob(bucket_name, blob_name, new_name):
"""Renames a blob."""
# bucket_name = "your-bucket-name"
# blob_name = "folder/myobject"
# new_name = "newfolder/myobject"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(blob_name)
new_blob = bucket.rename_blob(blob, new_name)
print("Blob {} has been renamed to {}".format(blob.name, new_blob.name))