For instance, if I do something like
f = open("demofile.txt", "w")
f.write("test content")
f.close()
in a serverless environment like Google Cloud Run or Anthos (assume this is part of a web app), will demofile.txt exist permanently, and will I always be able to access it through f.read()?
Your question is strange but I will try to answer it.
You can write a file in serverless Product, such as Cloud Run, Cloud Functions and App Engine. The /tmp dir is writable. BUT it's an in memory file system. That means you can write and access your data from your instance (and not from the other instances) and the file persist up to the end of the instance. In addition, the file take space in the allocated memory. So, the storage limit is the memory size of your instance.
With Cloud Run, there is a new previous feature (released publicly only few days ago) that allows you to use a 2nd gen runtime and to use network file system(Google Cloud Storage with GCSFuse, or Filestore). It's "external storage" but viewed as local directory from your app.
A last point on Anthos (because you mentioned it in your question): Anthos is a suite of products that allows to manage from Google Cloud console, resources (mainly Kubernetes Clusters) running out of Google Cloud. There is a version of Cloud Run for Anthos, but Anthos isn't a serverless product itself.
Related
I am developing a project in Google Cloud using both their App Engine and Compute Engine. I have a virtual machine instance set up on Compute Engine, with the name "instance-1". On this instance is the python file (file.py):
name = '<REPLACE_WITH_YOUR_NAME>'
print(name)
Well, this isn't exactly the file.py, but the concept applies below. Additionally, I have an App Engine project written in NodeJS, which is connecting to this instance via Google's Compute Engine API. Here is what I have in regards to that:
const Compute = require('#google-cloud/compute');
const compute = new Compute();
const zone = compute.zone('us-east1-b');
const vm_name = 'instance-1';
const vm = zone.vm(vm_name);
const my_name = "David Weiss"
// TODO: insert the variable my_name into the python code file.py where it says '<REPLACE_WITH_YOUR_NAME>'
// In my head, it would look something like this: vm.getFile('file.py', 'write').replace('<REPLACE_WITH_YOUR_NAME>', my_name);
After getting the instance 'instance-1, I don't know how to modify (or even add/replace/delete) files on it using NodeJS and the Compute Engine API. Can this be done? If it's not possible to replace the text within file.py, I would be okay with deleting the entire file and just writing a brand new file with my_name already inserted in there.
Think of your Compute Engine instance as a "computer". If you had files on a PC, how would you modify the files on that PC remotely? The GCP API for Compute Engines doesn't give you access to the file system of that instance. Instead, you would have to use technology such as scp or ftp. Perhaps if you described the higher level story, there might be alternative concepts we could use. For example, when a Compute Engine boots, it can run a startup script that might copy from someplace (eg. GCS). Another possibility is that your compute engine could run an app itself against which you could make REST requests and pass data that would be the content of a file written by the compute engine app.
If you still want to go down the file copy route and you want the requesting app to be written in python, then a possibility might be to review:
How to scp in Python?
2020-03-17 - Based on more comments
For provisioning new instances of Compute Engine ... I am sensing that you are using GCP API to create new Compute Engine instances through your App Engine app. If the puzzle you pose were before me, I'd be thinking down the following lines.
When the App Engine app determines that a new Compute Engine instance is to be created, we obviously have to give that new instance a unique name. No two Compute Instances can have the same identity. We thus have a "key" for that instance. Next, I would have the App Engine app create a file in Google Cloud Storage (GCS) that contains the exact file that you want inside the Compute Engine. Your app could build the content of the file dynamically. There would be one file per compute engine instance and the file name would match the name of the compute engine instance. At this point, we have your desired file in GCS. Next, I would create a shell script that copies the file from GCS (based on the name of the compute engine in which the script runs) to the local compute engine instance file system. Finally, I would specify this script as a "startup script" that is executed when the Compute Engine boots.
When the compute engine DOES boot, it will run the startup script early on in the boot cycle but prior to the user being able to login. The script would copy the file from GCS to local file system storage and when the script completes, the user can login and will find the file.
See also:
Running startup scripts
In my python server script which is running on a google cloud VM instance, it tries to save an image(jpeg) in the storage. But it throws following error.
File "/home/thamindudj_16/server/object_detection/object_detector.py",
line 109, in detect Hand
new_img.save("slicedhand/{}#sliced_image{}.jpeg".format(threadname,
i)) File
"/home/thamindudj_16/.local/lib/python3.5/site-packages/PIL/Image.py",
line 2004, in save
fp = builtins.open(filename, "w+b")
OSError: [Errno 5] Input/output error: 'slicedhand/thread_1#sliced_image0.jpeg'
All the files including python scripts are in a google storage bucket and have mounted to the VM instance using gcsfuse. App tries to save new image in the slicedhand folder.
Python code snippet where image saving happen.
from PIL import Image
...
...
i = 0
new_img = Image.fromarray(bounding_box_img) ## conversion to an image
new_img.save("slicedhand/{}#sliced_image{}.jpeg".format(threadname, i))
I think may be the problem is with permission access. Doc says to use --key_file. But what is the key file I should use and where I can find that. I'm not clear whether this is the problem or something else.
Any help would be appreciated.
I understand that you are using gcfuse on your Linux VM Instance to access Google Cloud Storage.
Key file is a Service Account credentials key, that will allow you to initiate Cloud SDK or Client Library as another Service Account. You can download key file from Cloud Console. However, if you are using VM Instance, you are automatically using Compute Engine Default Service Account. You can check it using console command: $ gcloud init.
To configure properly your credentials, please follow the documentation.
Compute Engine Default Service Account, need to have enabled Access Scope Storage > Full. Access Scope is the mechanism that limits access level to Cloud APIs. That can be done during machine creation or when VM Instance is stopped.
Please note that Access Scopes are defined explicitly for the Service Account that you select for VM Instance.
Cloud Storage objects names have requirements. It is strongly recommended avoid using hash symbol "#" in the names of the objects.
I am working in Python with Google Cloud ML-Engine. The documentation I have found indicates that data storage should be done with Buckets and Blobs
https://cloud.google.com/ml-engine/docs/tensorflow/working-with-cloud-storage
However, much of my code, and the libraries it calls works with files. Can I somehow treat Google Storage as a file system in my ml-engine code?
I want my code to read like
with open(<something>) as f:
for line in f:
dosomething(line)
Note that in ml-engine one does not create and configure VM instances. So I can not mount my own shared filesystem with Filestore.
The only way to have Cloud Storage appear as a filesystem is to mount a bucket as a file system:
You can use the Google Cloud Storage FUSE tool to mount a Cloud
Storage bucket to your Compute Engine instance. The mounted bucket
behaves similarly to a persistent disk even though Cloud Storage
buckets are object storage.
But you cannot do that if you can't create and configure VMs.
Note that in ml-engine one does not create and configure VM instances.
That's not entirely true. I see ML Engine supports building custom containers, which is typically how one can install and configure OS-level dependencies. But only for the training area, so if your needs are in that area it may be worth a try.
I assume you already checked that the library doesn't support access through an already open file-like handler (if not then maybe of interest would be How to restore Tensorflow model from Google bucket without writing to filesystem?)
For those that come after, here is the answer
Google Cloud ML and GCS Bucket issues
from tensorflow.python.lib.io import file_io
Here is an example
with file_io.FileIO("gc://bucket_name/foobar.txt","w") as f:
f.write("FOO")
f.flush()
print("Write foobar.txt")
with file_io.FileIO("gc://bucket_name/foobar.txt","r") as f:
for line in f:
print("Read foobar.txt: "+line)
I am new with Google App Engine and I am a little bit confused with answers which are related to the connections to a local Datastore.
My ultimate goal is to stream data from a Google Datastore towards a Big Query Dataset, similar to https://blog.papercut.com/google-cloud-dataflow-data-migration/. I have a copy of this DataStore locally, accessible when I run a local App Engine, i.e. I can access it through an admin console when I use $[GOOGLE_SDK_PATH]/dev_appserver.py --datastore_path=./datastore.
I would like to know if it is possible to connect to this datastore using services outside of the App Engine Instance, with python google-cloud-datastore or even Apache Beam ReadFromDatastore method. If not, should I use the Datastore Emulator with the App Engine Datastore generated file ?
If anyone has an idea on how to proceed, I would be more than grateful to know how to do.
If it is possible it would have to be through the Datastore Emulator, which is capable to also serve apps other than App Engine. But it ultimately depends on the implementation of the libraries you intend to use - if the underlying access methods are capable of understanding the DATASTORE_EMULATOR_HOST environment variable pointing to a running datastore emulator and use that instead of the real Datastore. I guess you'll just have to give it a try.
But be aware that the local storage dir internal format used by the Datastore Emulator may be different than that used by the development server, so make a backup of your .datastore dir before trying stuff, just in case. From Local data format conversion:
Currently, the local Datastore emulator stores data in sqlite3 while
the Cloud Datastore Emulator stores data as Java objects.
When dev_appserver is launched with legacy sqlite3 data, the data will
be converted to Java objects. The original data is backed up with the
filename {original-data-filename}.sqlitestub.
In the google app engine firebase tic-tac-toe example here: https://cloud.google.com/solutions/using-firebase-real-time-events-app-engine
nbd is used to create the Game data model. This model is used in the code to store the state of the tic-tac-toe game. I thought nbd was used to store data in Cloud Datastore, but, as far as I can tell, nothing is being stored in the Cloud Datastore of the associated google cloud project. I think this is because I am launching the app in 'dev mode' with python dev_appserver.py app.yaml In this case, is the data being stored in memory instead of actually being written to cloud datastore?
You're correct, running the application locally is using a datastore emulation, contained inside dev_appserver.py.
The data is not stored in memory, but on the local disk. So even if the development server restarts it will still find the "datastore" data written in a previous execution.
You can check the data actually saved using the local development server's admin interface at http://localhost:8000/datastore
Dan's answer is correct; your "dev_appserver.py" automatically creates a local datastore.
I would like to add that if you do wish to emulate a real Cloud Datastore environment and be able to generate usable indexes for your production Cloud Datastore, we have an emulator that can do that. I assume that's why you want your dev app to use the real Datastore?
Either way, if you just doing testing and need a persistent storage to test (not for production), then both the default devserver local storage and the Cloud Datastore Emulator would suffice.