Using Google Cloud Storage Files with Jupyter Notebook on Cloud Compute - python

I am working on a machine learning project and I just set up a google cloud account.
I have a VM instance up and running and Jupyter is working. I placed a couple of file folders on Google Cloud Storage assuming I could connect it to my VM and use the files in a Jupyter notebook running Python 3.
I have not been able to find a way to access the files in storage from my virtual machine. Someone help please!?

To access cloud storage from a VM, it needs the VM to have been created with the API access. When you initially create the VM, there are a number of options available under the cloud API scope section. Select the storage permission to give your VM access to cloud storage.
Now the VM has access to storage, you can use the gsutil command to access information directly from the cloud storage bucket using the name of the storage bucket.
You will also be able to extend the access of the storage bucket to colleagues should you wish by doing the above. Access permissions for the project can be controlled via IAM section of google cloud.

Related

What do I do with my local csv files when deploying Python Panel Application to Google Cloud

Hello I am trying to deploy my panel application with google cloud. I am just wondering how do I deal with my local csv files I am importing as data frames. I use df = pd.read_csv("some/local/directory") to create my data frames. And I use functions like os.path.join How do I handle this data when deploying? My data is all .csv files around 7gb.
Upload your csv files to Google Cloud Storage, it can be accessed from other Cloud Services.
Try Python Client for Cloud Storage and refer example Reading and writing to Cloud Storage
If you are running your application in Compute Engine VM, one more option is Cloud Fuse - you can mount storage bucket as a file system in VM.
Make sure to give storage bucket access to Service Account you are using.

Roles Required to write to Cloud Storage (GCP) from python (pandas)

I have a question for the GCP connoisseurs among you.
I have an issue that I can upload to a bucket via UI and gsutil - but if I try to do this via python
df.to_csv('gs://BUCKET_NAME/test.csv')
I get a 403 insufficient permission error.
My guess at the moment is that python does this via an API and requires an extra role - to make things more confusing I am already project owner of the project of the bucket and compared to other team members did not really find lacking permissions for this specific bucket.
I use python 3.9.1 via pyenv and pandas '1.4.2'
Anyone had the same issue/ knows what role I am missing?
I checked that I have in principal rights to upload both via UI and gsutil
I used the same virtual python environemnt to read and write from bigquery to check that I can in principle use GCP data in python - this works
I have the following Roles on the Bucket
Storage Admin, Storage Object Admin, Storage Object Creator, Storage Object Viewer
gsutil and gcloud share credentials.
These credentials are not shared with other code running locally.
The quick-fix but sub-optimal solution is to:
gcloud auth application-default login
And run the code again.
It will then use your gcloud (gsutil) user credentials configured to run as if you were using a Service Account.
These credentials are stored (on Linux) in ${HOME}/.config/gcloud/application_default_credentials.json.
A better solution is to create a Service Account specifically for your app and grant it the minimal set of IAM permissions that it will need (BigQuery, GCS, ...).
For testing purposes (!) you can download the Service Account key locally.
You can then auth your code using Google's Application Default Credentials (ADC) by (on Linux):
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/key.json
python3 your_app.py
When you deploy code that leverages ADC to a Google Cloud compute service (Compute Engine, Cloud Run, ...), it can be deployed unchanged because the credentials for the compute resource will be automatically obtained from the Metadata service.
You can Google e.g. "Google IAM BigQuery" to find the documentation that lists the roles:
IAM roles for BigQuery
IAM roles for Cloud Storage

Access file on google drive using Python Script

Please can anyone help me with how to fetch file stored in google drive?
I've a VM Compute engine in GCP and associated service account. This service account have an access to the google drive folder.
I thought to use python script on VM while will access the File on GDrive.
Not sure how to do this.
I guess you can try to impersonate the service account you are using.
Attaching a service account to a resource
For some Google Cloud resources, you can specify a user-managed service account that the resource uses as its default identity. This process is known as attaching the service account to the resource, or associating the service account with the resource.
When a resource needs to access other Google Cloud services and resources, it impersonates the service account that is attached to itself. For example, if you attach a service account to a Compute Engine instance, and the applications on the instance use a client library to call Google Cloud APIs, those applications automatically impersonate the attached service account.
Let me know if this was helpful.

How to upload files from location system to GCP Storage using Python in cloud sherr

I have files on my local system ,I'm able to upload to GCP Storage using python by connecting to GCP account using json file. I'm running this python program manually in a command prompt but want use the same program which can read data from my local system or share drive ..any suggestion how we can connect to local system path or share drive path from VM / Cloud shell to upload files to GCP Storage.
Thanks

How to keep a folder synchronised with an Azure storage account in Python?

In AWS, a similar functionality exists using awscli as explained here. Does there exist a similar functionality in Azure using Python SDK or CLI? Thanks.
There are two services Blob Storage & File Storage in Azure Storage, but I don't know which one of Azure Storage services is you want to be synchronised with a folder and what OS you used is.
As #Gaurav Mantri said, Azure File Sync is a good idea if you want to synchronise a folder with Azure File Share on your on-premise Windows Server.
However, if you want to synchronise Azure Blobs or some Unix-like OS you used like Linux/MacOS, I think you can try to use Azure Storage Fuse for Blob Storage or Samba client for File Storage with rsync command to achieve your needs.
First of all, the key point of the workaround solution is to mount the File/Blob service of Azure Storage as a local filesystem, then you can operate it in Python/Other ways as same as on local, as below.
For how to mount blob container as fs, to follow the installation instructions to install blobfuse, then to configure & run the necessary file/script to mount a blob container of Azure Storage account as the wiki page said.
For how to mount a file share with samba clint, please refer to the offical document Use Azure Files with Linux.
Then, you can directly operate all data in the filesystem of blobfuse mounted or samba mounted, or to do the folder synchronisation with rsync & inotify command, or to do other operations if you want.
Hope it helps. Any concern, please feel free to let me know.

Categories