I'm using selenium in order to extract some data (as a json file). This json is the final output of the script.
I've managed to do it locally so far in two different ways:
With a local webdriver (for Chrome).
With a Docker container.
However, I need it to be accessible from anywhere, in systems that don't have either webdrivers/Docker installed.
I have thought about deploying the script to Heroku and work around that idea, but I have no idea how to handle the data in this situation.
I think that cloud services are meant for these situations.
A storage account (S3 in Amazon or Blob Storage for Azure) allows you to acces the data from anywhere, and without almost any limitation of space, using its API or by using their SDK's.
Also you can specify access policies if your data should not be publicly accessible.
As you have already developed your script into a Docker conatiner, your are ready to run it in almost every cloud provider (for example in Amazon ECR).
Related
I have a python script that uses a python api to fetch data from a data provider, then after manipulating the data, writes some of that data to Google Sheets (via the Google Sheets python api). So my workflow to update the data is to open the python file in VS Code, and run it from there, then switch to the spreadsheet to see the updated data.
Is there a way to call this python script from Google Sheets using Google Apps Script? If so, that would be more efficient; I could link the GAS script to a macro button on the spreadsheet.
Apps Script runs in the cloud at Google's servers rather than in your computer, and has no access to local resources such as Python scripts in your system.
To call a resource in the cloud, use the URL Fetch Service.
Currently, I am using following hack.
The appsscript behind gsheet command button writes parameters in a sheet called Parameters. And the python function on my local machine checks that Parameters sheet in that google workbook every 5 seconds. If there are no parameters, then it exists. And if there are parameters, then it executes the main code
When the code is deployed on service account, the portion which is polling remains inactive. And the appsscript directly makes a call directly to python code in service account.
There are many reasons why I need to call python function on LOCAL machine from gsheet. One reason is --- debugging is better in local machine and cumbersome on service account. Another reason is --- certain files can be put on local machines and we do not want to move these files to workspace. And gsheet needs data from these files.
This is a HACK I am using
.
I am looking for a better way that this "python code keeps polling" method.
Noob and beginner here. Just trying to learn the basics of GCP.
I have a series of Google Cloud Buckets that are text files. I also have a VM instance that I've set up via GCP.
Now, I'm trying to write some code to extract the data from Google buckets and run the script via GCP's command prompt.
How can I extract GCP buckets in Python
I think that you can use the Listing Objects and Downloading Objects GCS methods with Python; in this way, you will be able to get a list of the objects stored in your Cloud Storage buckets to then extract them into you VM instance. Additionally, keep in mind that it is important to verify that the service account that you implement to perform these tasks, has the required roles assigned in order to access to your GCS buckets, as well as provide the credentials to your application by using environment variables or explicitly pointing to your service account file in code.
Once you have your code ready, you can simply execute your Python program by using the
python command. You can take a look on this link to get the instructions to install Python in your new environment.
I am writing an python application which reads/parses a file of this this kind.
myalerts.ini,
value1=1
value2=3
value3=10
value4=15
Currently I store this file in local filesystem. If I need to change this file I need to have physical access to this computer.
I want to move this file to cloud so that I can change this file anywhere (another computer or from phone).
If this application is running on some machine I should be able to change this file on the cloud and the application which is running on another machine which I don't have physical access to will be able to read updated file.
Notes,
I am new to both python and aws.
I am currently running it on my local mac/linux and planning on deploying on aws.
There are many options!
Amazon S3: This is the simplest option. Each computer could download the file at regular intervals or just before they run a process. If the file is big, the app could instead check whether the file has changed before downloading.
Amazon Elastic File System (EFS): If your applications are running on multiple Amazon EC2 instances, EFS provides a shared file system that can be mounted on each instance.
Amazon DynamoDB: A NoSQL database instead of a file. Much faster than parsing a file, but less convenient for updating values — you'd need to write a program to update values, eg from the command-line.
AWS Systems Manager Parameter Store: A managed service for storing parameters. Applications (anywhere on the Internet) can request and update parameters. A great way to configure cloud-based application!
If you are looking for minimal change and you want it accessible from anywhere on the Internet, Amazon S3 is the easiest choice.
Whichever way you go, you'll use the boto3 AWS SDK for Python.
I have a python script (on my local machine) that queries Postgres database and updates a Google sheet via sheets API. I want the python script to run on opening the sheet. I am aware of Google Apps Script, but not quite sure how can I use it, to achieve what I want.
Thanks
Google Apps Script runs on the server side, so it can't be used to run a local script.
you will need several changes. first you need to move the script to the cloud (see google compute engine) and be able to access your databases from there.
then, from apps script look at the onOpen trigger. from there you can urlFetchApp to your python server to start the work.
you could also add a custom "refresh" menu to the sheet to call your server which is nicer than having to reload the sheet.
note that onOpen runs server side on google thus its impossible for it to access your local machine files.
There's a GAE project using the GCS to store/retrieve files. These files also need to be read by code that will run on GCE (needs C++ libraries, so therefore not running on GAE).
In production, deployed on the actual GAE > GCS < GCE, this setup works fine.
However, testing and developing locally is a different story that I'm trying to figure out.
As recommended, I'm running GAE's dev_appserver with GoogleAppEngineCloudStorageClient to access the (simulated) GCS. Files are put in the local blobstore. Great for testing GAE.
Since these is no GCE SDK to run a VM locally, whenever I refer to the local 'GCE', it's just my local development machine running linux.
On the local GCE side I'm just using the default boto library (https://developers.google.com/storage/docs/gspythonlibrary) with a python 2.x runtime to interface with the C++ code and retrieving files from the GCS. However, in development, these files are inaccessible from boto because they're stored in the dev_appserver's blobstore.
Is there a way to properly connect the local GAE and GCE to a local GCS?
For now, I gave up on the local GCS part and tried using the real GCS. The GCE part with boto is easy. The GCS part is also able to use the real GCS using an access_token so it uses the real GCS instead of the local blobstore by:
cloudstorage.common.set_access_token(access_token)
According to the docs:
access_token: you can get one by run 'gsutil -d ls' and copy the
str after 'Bearer'.
That token works for a limited amount of time, so that's not ideal. Is there a way to set a more permanent access_token?
There is convenience option to access Google Cloud Storage from development environment. You should use client library provided with Google Cloud SDK. After executing gcloud init locally you get access to your resources.
As shown in examples to Client library authentication:
# Get the application default credentials. When running locally, these are
# available after running `gcloud init`. When running on compute
# engine, these are available from the environment.
credentials = GoogleCredentials.get_application_default()
# Construct the service object for interacting with the Cloud Storage API -
# the 'storage' service, at version 'v1'.
# You can browse other available api services and versions here:
# https://developers.google.com/api-client-library/python/apis/
service = discovery.build('storage', 'v1', credentials=credentials)
Google libraries come and go like tourists in a train station. Today (2020) google-cloud-storage should work on GCE and GAE Standard Environment with Python 3.
On GAE and CGE it picks up access credentials from the environment and locally you can provide it whit a servce account JSON-file like this:
GOOGLE_APPLICATION_CREDENTIALS=../sa-b0af54dea5e.json
If you're always using "real" remote GCS, the newer gcloud is probably the best library: http://googlecloudplatform.github.io/gcloud-python/
It's really confusing how many storage client libraries there are for Python. Some are for AE only, but they often force (or at least default to) using the local mock Blobstore when running with dev_appserver.py.
Seems like gcloud is always using the real GCS, which is what I want.
It also "magically" fixes authentication when running locally.
It looks like appengine-gcs-client for Python is now only useful for production App Engine and inside dev_appserver.py, and the local examples for it have been removed from the developer docs in favor of Boto :( If you are deciding not to use the local GCS emulation, it's probably best to stick with Boto for both local testing and GCE.
If you still want to use 'google.appengine.ext.cloudstorage' though, access tokens always expire so you'll need to manually refresh it. Given your setup honestly the easiest thing to to is just call 'gsutil -d ls' from Python and parse the output to get a new token from your local credentials. You could use the API Client Library to get a token in a more 'correct' fashion, but at that point things would be getting so roundabout you might as well just be using Boto.
There is a Google Cloud Storage local / development server for this purpose: https://developers.google.com/datastore/docs/tools/devserver
Once you have set it up, create a dataset and start the GCS development server
gcd.sh create [options] <dataset-directory>
gcd.sh start [options] <dataset-directory>
Export the environment variables
export DATASTORE_HOST=http://yourmachine:8080
export DATASTORE_DATASET=<dataset_id>
Then you should be able to use the datastore connection in your code, locally.