I botched a Firebase cloud function and accidentally created 1.9 million images stored in gs://myapp.appspot.com//tmp/. That double slash is accurate--the server was writing to /tmp/, which I guess results in the path mentioned above.
I'm now wanting to delete those files (they're all nonsense). I tried using the Python wrapper like so:
export GOOGLE_APPLICATION_CREDENTIALS="../secrets/service-account.json"
Then:
from google.cloud import storage
storage_client = storage.Client()
bucket = storage_client.get_bucket('tmp')
blobs = bucket.list_blobs(bucket='tmp', prefix='')
for blob in blobs:
print(' * deleting', blob)
blob.delete()
But this throws:
google.api_core.exceptions.Forbidden: 403 GET https://storage.googleapis.com/storage/v1/b/tmp?projection=noAcl: firebase-adminsdk-yr6f8#myapp.iam.gserviceaccount.com does not have storage.buckets.get access to tmp.
Does anyone know how to allow the admin credentials to delete from /tmp/? Any pointers would be hugely helpful!
I was able to reproduce this problem with gsutil command:
gsutil cp ~/<example-file> gs://<my-project-name>.appspot.com//tmp/
First of all, in my Firebase console I am able to do it with one tick (whole folder) not sure if you consider this.
Anyway if you want to have it done with API I have found following solution.
I think (comparing to my test) bucket name should be: myapp.appspot.com
If you print the blobs in python you will get something like this: <Blob: <my-project-name>.appspot.com, /tmp/<example-file>, 1584700014023619>
The 2nd value is name property of blob object. I noticed that in this situation its blobs name starts with /tmp/
Code that works on my side is:
from google.cloud import storage
storage_client = storage.Client()
bucket = storage_client.get_bucket('myapp.appspot.com')
blobs = bucket.list_blobs()
for blob in blobs:
if blob.name.startswith('/tmp/'):
print(' * deleting', blob)
blob.delete()
I don't think its very elegant solution, but for the one time fix maybe be good enough.
I hope it will help!
Related
I thought it only return the leaf blob files and that has been the case earlier, but now all the sudden it returns the virtual directory as well, did I have the wrong impression or is it that something changed over the night?
Assume I have the following structure
container/dir0/dir1/blob1.json
container/dir0/dir1/blob2.json
And the following code
blobs = blob_service_client.list_blobs('container', 'dir0/')
for blob in blobs:
print(blob.name)
will return
dir0/dir1
dir0/dir1//blob1.json
dir0/dir1//blob2.json
instead of
dir0/dir1//blob1.json
dir0/dir1//blob2.json
anyway to not having the virtual directory as one blob returned in the list?
After reproducing from my end, I could able to get this done by using the BlockBlobService class.
pip install azure-storage-blob==2.1.0
You can use either list_blobs() or list_blob_names() method to list the blobs inside the specified container. Below is the complete code that worked for me.
from azure.storage.blob import BlockBlobService
ACCOUNT_NAME = "<ACCOUNT_NAME>"
CONTAINER_NAME = "<CONTAINER_NAME>"
SAS_TOKEN='<SAS_TOKEN>'
block_blob_service = BlockBlobService(account_name=ACCOUNT_NAME,account_key=None,sas_token=SAS_TOKEN)
print("\nList of blobs in "+CONTAINER_NAME+"\n")
generator = block_blob_service.list_blobs(CONTAINER_NAME,'dir0/')
for blob in generator:
print(blob.name)
RESULTS:
I've been trying to send an image to firebase storage but when it gets to the storage, firebase can't render the image.
The image for now is pure base64.
versions:
Python 3.10.6
firebase==3.0.1
firebase-admin==6.0.1
Flask==2.0.3
dontpad.com for the base64 being used
Code:
def filePath(folderPath):
return f'{folderPath}/{date.today()}'
def fileUpload(file,folderPath):
fileName = filePath(folderPath)
from firebase_admin import storage
bucket = storage.bucket()
blob = bucket.blob(fileName)
blob.upload_from_string(file,'image/jpg' )
blob.make_public()
return blob.public_url
Additional info if needed will be provided when asked.
Expected:
Result:
What did I try?
Alternative data objects to replace base64 has been studied in the project but base64 is the only data I'm provided for the image so alternative ways have been discarded.
Most similar questions have used JavaScript, that's not my case, and they use different libraries with different methods and parameters so that hasn't helped my case.
Tried adding "data:image/jpeg;base64," to the start of the filename.
Tried replacing content type with "data_url" or "base64".
Tried uploading with and without the extension on the filename.
from pyrsgis import raster
from pyrsgis.convert import changeDimension
# Assign file names
greenareaband1='Sentinel-2 (2)dense.tiff'
greenareaband2='Sentinel-2 L1C (3)dense.tiff'
greenareaband3='Sentinel-2 L1C (4)dense.tiff'
# Read the rasters as array
df,myimage=raster.read(greenareaband1,bands='all')
AttributeError: 'NoneType' object has no attribute 'ReadAsArray'
I keep getting this error but i'm sure that i have uploaded these images using
from google.colab import files
files.upload()
I had the same problem and I discovered that I made a mistake on assigning the file name. Maybe there is a mistake there and thus, it is not recognized as a tif, and thence be able to ReadAsArray(). Hope that is the only problem.
You have a couple of issue here. Having two spaces and parentheses in your file name is the last thing you want to do in Python. Make sure that you have changed the working directory to where your file is or provide relative path and add 'r' in the beginning. For example:
input_file = r'E:/path_to_your_file/raster_file.tif'
ds, data_arr = raster.read(input_file)
About working with Colab. I think the best option would be upload you files on your Google drive and then authenticate your Colab script to mount drive. Then you just need to change the working directory like this:
# authenticate google drive
from google.colab import drive
drive.mount('/content/drive')
# change working directory
os.chdir(r'/content/drive/My Drive/path_to_your_file')
Or, after mounting the drive simply do this:
input_file = r'/content/drive/My Drive/path_to_your_file/raster_file.tif'
ds, data_arr = raster.read(input_file)
I am trying to get blob information from a bucket but i want to use wildcards in blob name. Consider my bucket
$ gsutil ls gs://myBucket/myPath/
gs://myBucket/myPath/
gs://myBucket/myPath/ranOn=2018-12-11/
gs://myBucket/myPath/ranOn=2018-12-12/
gs://myBucket/myPath/ranOn=2018-12-13/
gs://myBucket/myPath/ranOn=2018-12-14/
gs://myBucket/myPath/ranOn=2018-12-15/
gs://myBucket/myPath/ranOn=2019-02-18/
gs://myBucket/myPath/ranOn=2019-02-19/
gs://myBucket/myPath/ranOn=2019-02-20/
gs://myBucket/myPath/ranOn=2019-02-21/
now from the command line, i am able to do
$ gsutil ls gs://myBucket/myPath/ranOn=2018*
gs://myBucket/myPath/
gs://myBucket/myPath/ranOn=2018-12-11/
gs://myBucket/myPath/ranOn=2018-12-12/
gs://myBucket/myPath/ranOn=2018-12-13/
gs://myBucket/myPath/ranOn=2018-12-14/
gs://myBucket/myPath/ranOn=2018-12-15/
and hence i can do the same for the size
$ gsutil du -sh gs://myBucket/myPath/ranOn=2018*
2.7 G
now, i want to do the same thing with the python api. Here is what i tried
from google.cloud import storage
storage_client = storage.Client()
bucket = storage_client.get_bucket('myBucket')
blob = bucket.get_blob('myPath/ranOn=2018*')
print('Size: {} bytes'.format(blob.size))
Size: None bytes
why is this not working? How can i use wildcards in blob paths with python api?
Unfortunately get_blob is just for getting individual files, not multiple files.
You'll need to iterate over all the files that match the prefix and sum their sizes to get the total size.
blobs = bucket.list_blobs(prefix="myPath/ranOn=2018")
total = sum([blob.size for blob in blobs])
(cross posted to boto-users)
Given an image ID, how can I delete it using boto?
You use the deregister() API.
There are a few ways of getting the image id (i.e. you can list all images and search their properties, etc)
Here is a code fragment which will delete one of your existing AMIs (assuming it's in the EU region)
connection = boto.ec2.connect_to_region('eu-west-1', \
aws_access_key_id='yourkey', \
aws_secret_access_key='yoursecret', \
proxy=yourProxy, \
proxy_port=yourProxyPort)
# This is a way of fetching the image object for an AMI, when you know the AMI id
# Since we specify a single image (using the AMI id) we get a list containing a single image
# You could add error checking and so forth ... but you get the idea
images = connection.get_all_images(image_ids=['ami-cf86xxxx'])
images[0].deregister()
(edit): and in fact having looked at the online documentation for 2.0, there is another way.
Having determined the image ID, you can use the deregister_image(image_id) method of boto.ec2.connection ... which amounts to the same thing I guess.
With newer boto (Tested with 2.38.0), you can run:
ec2_conn = boto.ec2.connect_to_region('xx-xxxx-x')
ec2_conn.deregister_image('ami-xxxxxxx')
or
ec2_conn.deregister_image('ami-xxxxxxx', delete_snapshot=True)
The first will delete the AMI, the second will also delete the attached EBS snapshot
For Boto2, see katriels answer. Here, I am assuming you are using Boto3.
If you have the AMI (an object of class boto3.resources.factory.ec2.Image), you can call its deregister function. For example, to delete an AMI with a given ID, you can use:
import boto3
ec2 = boto3.resource('ec2')
ami_id = 'ami-1b932174'
ami = list(ec2.images.filter(ImageIds=[ami_id]).all())[0]
ami.deregister(DryRun=True)
If you have the necessary permissions, you should see an Request would have succeeded, but DryRun flag is set exception. To get rid of the example, leave out DryRun and use:
ami.deregister() # WARNING: This will really delete the AMI
This blog post elaborates on how to delete AMIs and snapshots with Boto3.
Script delates the AMI and associated Snapshots with it. Make sure you have right privileges to run this script.
Inputs - Please pass region and AMI ids(n) as inputs
import boto3
import sys
def main(region,images):
region = sys.argv[1]
images = sys.argv[2].split(',')
ec2 = boto3.client('ec2', region_name=region)
snapshots = ec2.describe_snapshots(MaxResults=1000,OwnerIds=['self'])['Snapshots']
# loop through list of image IDs
for image in images:
print("====================\nderegistering {image}\n====================".format(image=image))
amiResponse = ec2.deregister_image(DryRun=True,ImageId=image)
for snapshot in snapshots:
if snapshot['Description'].find(image) > 0:
snap = ec2.delete_snapshot(SnapshotId=snapshot['SnapshotId'],DryRun=True)
print("Deleting snapshot {snapshot} \n".format(snapshot=snapshot['SnapshotId']))
main(region,images)
using the EC2.Image resource you can simply call deregister():
Example:
for i in ec2res.images.filter(Owners=['self']):
print("Name: {}\t Id: {}\tState: {}\n".format(i.name, i.id, i.state))
i.deregister()
See this for using different filters:
What are valid values documented for ec2.images.filter command?
See also: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2.html#EC2.Image.deregister