I have managed to create a simple app which deletes (bypassing the recycle bin) any files I want to. It can also upload files. The problem I am having is that I cannot specify which collection the new file should be uploaded to.
def UploadFile(folder, filename, local_file, client):
print "Upload Resource"
doc = gdata.docs.data.Resource(type='document', title=filename)
path = _GetDataFilePath(local_file)
media = gdata.data.MediaSource()
media.SetFileHandle(path, 'application/octet-stream')
create_uri = gdata.docs.client.RESOURCE_UPLOAD_URI + '?convert=false'
collection_resource = folder
upload_doc = client.CreateResource(doc, create_uri=create_uri, collection=collection_resource, media=media)
print 'Created, and uploaded:', upload_doc.title, doc.resource_id
From what I understand the function CreateResources requires a resource object representing the collection. How do I get this object? The variable folder is currently just a string which says 'daily' which is the name of the collection, it is this variable which I need to replace with the collection resource.
From various sources, snippets and generally stuff all over the place I managed to work this out. You need to pass a uri to the FindAllResources function (one which I found no mention of in the sample code from gdata).
I have written up in more detail how I managed to upload, delete (bypassing the bin), search for and move files into collections
here
Related
I have a bunch of folders in Dropbox with pictures in them, and I'm trying to get a list of URLs for all of the pictures in a specific folder.
import requests
import json
import dropbox
TOKEN = 'my_access_token'
dbx = dropbox.Dropbox(TOKEN)
for entry in dbx.files_list_folder('/Main/Test').entries:
# print(entry.name)
print(entry.file_requests.FileRequest.url)
# print(entry.files.Metadata.path_lower)
# print(entry.file_properties.PropertyField)
printing the entry name correctly lists all of the file names in the folder, but everything else says 'FileMetadata' object has no attribute 'get_url'.
The files_list_folder method returns a ListFolderResult, where ListFolderResult.entries is a list of Metadata. Files in particular are FileMetadata.
Also, note that you aren't guaranteed to get everything back from files_list_folder method, so make sure you implement files_list_folder_continue as well. Refer to the documentation for more information.
The kind of link you mentioned is a shared link. FileMetadata don't themselves contain a link like that. You can get the path from path_lower though. For example, in the for loop in your code, that would look like print(entry.path_lower).
You should use sharing_list_shared_links to list existing links, and/or sharing_create_shared_link_with_settings to create shared links for any particular file as needed.
I've created python code to create a range of folders and subfolders (for data lake) in an Azure storage container. The code works and is based on the documentation on Microsoft Azure. One thing though is that I'm creating a dummy 'txt' file in the folders in order to create the directory (which I can clean up later). I was wondering if there's a way to create the folders and subfolders without creating a file. I understand that the folders in Azure container storage are not hierarchical and are instead metadata and what I'm asking for may not be possible?
connection_string = config['azure_storage_connectionstring']
gen2_container_name = config['gen2_container_name']
container_client = ContainerClient.from_connection_string(connection_string, gen2_container_name)
blob_service_client = BlobServiceClient.from_connection_string(connection_string)
# blob_service_client.create_container(gen2_container_name)
def create_folder(folder, sub_folder):
blob_client = container_client.get_blob_client('{}/{}/start_here.txt'.format(folder, sub_folder))
with open ('test.txt', 'rb') as data:
blob_client.upload_blob(data)
def create_all_folders():
config = load_config()
folder_list = config['folder_list']
sub_folder_list = config['sub_folder_list']
for folder in folder_list:
for sub_folder in sub_folder_list:
try:
create_folder(folder, sub_folder)
except Exception as e:
print ('Looks like something went wrong here trying to create this folder structure {}/{}. Maybe the structure already exists?'.format(folder, sub_folder))
I've created python code to create a range of folders and subfolders
(for data lake) in an Azure storage container. The code works and is
based on the documentation on Microsoft Azure. One thing though is
that I'm creating a dummy 'txt' file in the folders in order to create
the directory (which I can clean up later). I was wondering if there's
a way to create the folders and subfolders without creating a file. I
understand that the folders in Azure container storage are not
hierarchical and are instead metadata and what I'm asking for may not
be possible?
No, for blob storage, this is not possible. There is no way to create so-called "folders"
But you can use data-lake SDK like this to create directory:
from azure.storage.filedatalake import DataLakeServiceClient
connect_str = "DefaultEndpointsProtocol=https;AccountName=0730bowmanwindow;AccountKey=xxxxxx;EndpointSuffix=core.windows.net"
datalake_service_client = DataLakeServiceClient.from_connection_string(connect_str)
myfilesystem = "test"
myfolder = "test1111111111"
myfile = "FileName.txt"
file_system_client = datalake_service_client.get_file_system_client(myfilesystem)
directory_client = file_system_client.create_directory(myfolder)
Just to add some context, the reason this is not possible in Blob Storage is that folders/directories are not "real". Folders do not exist as standalone objects, they are only defined as part of a blob name.
For example, if you have a folder "mystuff" with a file (blob) "somefile.txt", the blob name actually includes the folder name and "/" character like mystuff/somefile.txt. The blob exists directly inside the container, not inside a folder. This naming convention can be nested many times over in a blob name like folder1/folder2/mystuff/anotherfolder/somefile.txt, but that blob still only exists directly in the container.
Folders can appear to exist in certain tooling (like Azure Storage Explorer) because the SDK permits blob name filtering: if you do so on the "/" character, you can mimic the appearance of a folder and its contents. But in order for a folder to even appear to exist, there must be blob in the container with the appropriate name. If you want to "force" a folder to exist, you can create a 0-byte blob with the correct folder path in the name, but the blob artifact will still need to exist.
The exception is Azure Data Lake Storage (ADLS) Gen 2, which is Blob Storage that implements a Hierarchical Namespace. This makes it more like a file system and so respects the concept of Directories as standalone objects. ADLS is built on Blob Storage, so there is a lot of parity between the two. If you absolutely must have empty directories, then ADLS is the way to go.
The context is that I need to end the process if a file, stored in Google Cloud Platform is empty, but if isn´t empty, follow the normal workflow. I´m doing this with a branch operator in airflow, but i have to pass a condition to decide if the process needs to end there, or continue.
So my question is: how can I get the size of a flat file stored in a bucket in GCP?
Thanks in advance!
You can use the Blobs/Objects builtin functions inside the Google Cloud Storage Library for Python.
In order to check if a file is inside your bucket and its size is greater than zero, I have created the following code:
from google.cloud.storage import Blob
from google.cloud import storage
client = storage.Client()
bucket = client.bucket('bucket_name')
desired_file = "file_name.csv"
for blob in bucket.list_blobs():
if desired_file== blob.name and blob.size > 0:
print("Name: "+ blob.name +" Size blob obj: "+str(blob.size) + "bytes")
#do something
Above, the list_blobs() method was used to list all the files inside the specified bucket. Then, we used blob.name to retrieve the file's name and blob.size in order to return the file's size in BYTES. After this small chunk of code you can continue your tasks.
Additional information: It is also possible to filter the files you will list with a prefix, in case there are a huge amount, such as for blob in client_bucket.bucket('bucket_name') .list_blobs(prefix='test_'):
UPDATE:
In order to give more fine grained permissions, regarding specific buckets and objects, you can use Access Control Lists. It allows you to define access to particular buckets and objects,according to a desired access level. Thus, go to: Storage> Bucket> Click on your file> Click on EDIT PERMISSIONS (upper mid screen, next to DOWNLOADS)> Add item. Then, select the entity you want to add such as : Project, Domain, Group, User, fill in the name (email id, project, service account). Link for "How to use ACLs" from Google.
Im trying zip a few files from Google Storage.
The zipfile of Python doesnt find the files in gcloud, just in the project.
How can I do for my code find the files in gcloud?
zip_buffer = io.BytesIO()
with zipfile.ZipFile(zip_buffer, 'w') as zip_file:
for revenue in revenues:
# queryset with files a lot, so, for a each file, add in zip
t = tempfile.NamedTemporaryFile()
t.write(revenue.revenue.name)
if revenue.revenue.name:
t.seek(0)
with default_storage.open(revenue.revenue.name, "r") as file_data:
zip_file.write(file_data.name, compress_type=zipfile.ZIP_DEFLATED)
# the code dont pass from this part
t.close()
response = HttpResponse(content_type='application/x-zip-compressed')
response['Content-Disposition'] = 'attachment; filename=my_zip.zip'
response.write(zip_buffer.getvalue())
return response
In this part, I write the file that I opened from gcloud, but stop inside the function:
def write(self, filename, arcname=None, compress_type=None):
"""Put the bytes from filename into the archive under the name
arcname."""
if not self.fp:
raise RuntimeError(
"Attempt to write to ZIP archive that was already closed")
st = os.stat(filename)
# when I try find the file, the command os.stat search in project, not in gcloud
the "os.stat(filename)" search for a file in project, how can I do for find in the gcloud?
I will post my findings as an answer, since I would like to comment about few things.
I have understood:
You have a Python library zipfile that is used to work with ZIP files.
You are looking for files locally and add one by one into the ZIP file.
You would like to do this as well for files located in Google Cloud Storage bucket. But it is failing to find the files.
If I have misunderstood the use-case scenario, please elaborate further in a comment.
However, if this is exactly what you are trying to do, then this is not supported. In the StackOverflow Question - Compress files saved in Google cloud storage, it is stated that compressing files that are already in the Google Cloud Storage is not possible. The solution in that question is to subscribe to newly created files and then download them locally, compress them and overwrite them in GCS. As you can see, you can list the files, or iterate through the files stored in GCS, but you first need to download them to be able to process them.
Work around
Therefore, in your use-case scenario, I would recommend the following workaround, by using the Python client API:
You can use Listing objects Python API, to get all the objects from GCS.
Then you can use Downloading objects Python API, to download the objects locally.
As soon as the objects are located in local directory, you can use the zipfile Python library to ZIP them together, as you are already doing it.
Then the objects are ZIPed and if you no longer need the downloaded objects, you can delete them with os.remove("downloaded_file.txt").
In case you need to have the compressed ZIP file in the Google Cloud Storage bucket, then you can use the Uploading objects Python API to upload the ZIP file in the GCS bucket.
As I have mentioned above, processing files (e.g. Adding them to a ZIP files etc.) directly in Google Cloud Storage bucket, is not supported. You first need to download them locally in order to do so. I hope that my workaround is going to be helpful to you.
UPDATE
As I have mentioned above, zipping files while they are in GCS bucket is not supported. Therefore I have prepared for you a working example in Python on how to use the workaround.
NOTE: As I am not professional on operating os commands with Python
library and I am not familiar with zipfile library, there is
probably a better and more efficient way of achieving this. However,
the code that can be found in this GitHub link, does the following
procedures:
Under #Public variables: section change BUCKET_NAME to your corresponding bucket name and execute the python script in Google Cloud Shell. Cloud Shell
Now my bucket structure is as follows:
gs://my-bucket/test.txt
gs://my-bucket/test1.txt
gs://my-bucket/test2.txt
gs://my-bucket/directory/test4.txt
When executing the command, what the app does is the following:
Will get the path of where the script is executed. e.g. /home/username/myapp.
It will create a temporary directory within this directory e.g. /home/username/myapp/temp
It will iterate through all the files located in the bucket that you have specified and will download them locally inside that temp directory.
NOTE: If the file in the bucket is under directory it will simple download the file, instead of creating that sub-directory again. You can modify the code to make it work as you desired later.
So the new downloaded files will look like this:
/home/username/myapp/temp/test.txt
/home/username/myapp/temp/test1.txt
/home/username/myapp/temp/test2.txt
/home/username/myapp/temp/test4.txt
After that, the code will zip all those files to a new zipedFile.zip that will be located in the same directory with the main.py script that you have executed.
When this step is done as well, the script will delete the directory /home/username/myapp/temp/ with all of its contents.
As I have mentioned above, after executing the script locally, you should be able to see the main.py and an zipedFile.zip file with all the zipped files from the GCS bucket. Now you can take the idea of implementation and modify it according to your project's needs.
the final code:
zip_buffer = io.BytesIO()
base_path = '/home/everton/compressedfiles/'
fiscal_compentecy_month = datetime.date(int(year), int(month), 1)
revenues = CompanyRevenue.objects.filter(company__pk=company_id, fiscal_compentecy_month=fiscal_compentecy_month)
if revenues.count() > 0:
path = base_path + str(revenues.first().company.user.pk) + "/"
zip_name = "{}-{}-{}-{}".format(revenues.first().company.external_id, revenues.first().company.external_name, month, year)
for revenue in revenues:
filename = revenue.revenue.name.split('revenues/')[1]
if not os.path.exists(path):
os.makedirs(path)
with open(path + filename, 'wb+') as file:
file.write(revenue.revenue.read())
file.close()
with zipfile.ZipFile(zip_buffer, 'w') as zip_file:
for file in os.listdir(path):
zip_file.write(path + file, compress_type=zipfile.ZIP_DEFLATED)
zip_file.close()
response = HttpResponse(content_type='application/x-zip-compressed')
response['Content-Disposition'] = 'attachment; filename={}.zip'.format(zip_name)
response.write(zip_buffer.getvalue())
shutil.rmtree(path)
return response
Is there any way to get the uploaded file date and name which we have stored into the database using forms ?
Right now I am just creating two more database tuples for name and date and storing them like this file_name = request.FILES['file'].name for file_name and storing date using upload_date = datetime.datetime.now()
You can kind of get the date after reading the metadata of the file using the stat module.
http://docs.python.org/release/2.5.2/lib/module-stat.html
It is OS specific but ST_CTIME should give you approximately what you looking for.
For the name, you can easily get it from the way you store. Specify a custom handler that stores the file at /your/file/path/filename.extension and just manipulate the string for the filename
Just read this in the flask docs. Not sure how much it is applicable in Django but pasting here for reference
*If you want to know how the file was named on the client before it was uploaded to your application, you can access the filename attribute. However please keep in mind that this value can be forged so never ever trust that value. If you want to use the file-name of the client to store the file on the server, pass it through the secure_filename() function that Werkzeug provides for you*
You can use the original file's name as part of the file name when storing in the disk, and you probably can use the file's creation/modification date for the upload date. IMO, you should just store it explicitly in the database.