How can I upload data into a high-replication GAE app? - python

Until now, I've been using the bulkloader to upload data into an app, but I've noticed that Google has added a warning that the bulkloader is intended for use with the master/slave datastore:
Warning: This document applies to apps that use the master/slave datastore. If your app uses the High Replication datastore, it is possible to copy data from the app, but Google does not currently support this use case. If you attempt to copy from a High Replication datastore, you'll see a high_replication_warning error in the Admin Console, and the downloaded data might not include recently saved entities.
Is there a recommended way of getting data into and out of an app that uses the HR datastore?

Related

Connect to local App Engine Datastore with Apache Beam

I am new with Google App Engine and I am a little bit confused with answers which are related to the connections to a local Datastore.
My ultimate goal is to stream data from a Google Datastore towards a Big Query Dataset, similar to https://blog.papercut.com/google-cloud-dataflow-data-migration/. I have a copy of this DataStore locally, accessible when I run a local App Engine, i.e. I can access it through an admin console when I use $[GOOGLE_SDK_PATH]/dev_appserver.py --datastore_path=./datastore.
I would like to know if it is possible to connect to this datastore using services outside of the App Engine Instance, with python google-cloud-datastore or even Apache Beam ReadFromDatastore method. If not, should I use the Datastore Emulator with the App Engine Datastore generated file ?
If anyone has an idea on how to proceed, I would be more than grateful to know how to do.
If it is possible it would have to be through the Datastore Emulator, which is capable to also serve apps other than App Engine. But it ultimately depends on the implementation of the libraries you intend to use - if the underlying access methods are capable of understanding the DATASTORE_EMULATOR_HOST environment variable pointing to a running datastore emulator and use that instead of the real Datastore. I guess you'll just have to give it a try.
But be aware that the local storage dir internal format used by the Datastore Emulator may be different than that used by the development server, so make a backup of your .datastore dir before trying stuff, just in case. From Local data format conversion:
Currently, the local Datastore emulator stores data in sqlite3 while
the Cloud Datastore Emulator stores data as Java objects.
When dev_appserver is launched with legacy sqlite3 data, the data will
be converted to Java objects. The original data is backed up with the
filename {original-data-filename}.sqlitestub.

How is ndb (and cloud datastore) being used in the firebase tic-tac-toe example

In the google app engine firebase tic-tac-toe example here: https://cloud.google.com/solutions/using-firebase-real-time-events-app-engine
nbd is used to create the Game data model. This model is used in the code to store the state of the tic-tac-toe game. I thought nbd was used to store data in Cloud Datastore, but, as far as I can tell, nothing is being stored in the Cloud Datastore of the associated google cloud project. I think this is because I am launching the app in 'dev mode' with python dev_appserver.py app.yaml In this case, is the data being stored in memory instead of actually being written to cloud datastore?
You're correct, running the application locally is using a datastore emulation, contained inside dev_appserver.py.
The data is not stored in memory, but on the local disk. So even if the development server restarts it will still find the "datastore" data written in a previous execution.
You can check the data actually saved using the local development server's admin interface at http://localhost:8000/datastore
Dan's answer is correct; your "dev_appserver.py" automatically creates a local datastore.
I would like to add that if you do wish to emulate a real Cloud Datastore environment and be able to generate usable indexes for your production Cloud Datastore, we have an emulator that can do that. I assume that's why you want your dev app to use the real Datastore?
Either way, if you just doing testing and need a persistent storage to test (not for production), then both the default devserver local storage and the Cloud Datastore Emulator would suffice.

Is it possible to use `S3 storage` as HDF5 file storage and share it with other server?

I'm gonna get tons of data via Stock API (by Korean Stock company) and trying to store all of them as HDF5 format using Python PyTable. And there is another server running Django web application. This web app gonna use those HDF5 datas.
But there are some limitations:
1. API is only available in Window OS
2. Data(pandas or numpy form) must be stored as HDF5 format. Not using SQL Database
This is how system looks:
Data generator
Window OS computer board such as LattePanda.(http://www.lattepanda.com/)
or AWS window EC2(?) # Not sure about this. Need your advices
Data storage : AWS S3
Web application : AWS EC2
So, LattePanda is running on my home and it gets stock datas periodically. And it store them in AWS S3. Django web application run on EC2 gonna fetch these datas and render them on templates.
Are those optimal system composition?

Google AppEngine - How To Perform a Partial Datastore Download

I have a running GAE app that has been collecting data for a while. I am now at the point where I need to run some basic reports on this data and would like to download a subset of the live data to my dev server. Downloading all entities of a kind will simply be too big a data set for the dev server.
Does anyone know of a way to download a subset of entities from a particular kind? Ideally it would be based on entity attributes like date, or client ID etc... but any method would work. I've even tried a regular, full, download then arbitrarily killing the process when I thought I had enough data, but it seems the data is locked up in the .sql3 files generated by the bulkloader.
It looks like that default download/upload from/to GAE datastore utilities don't support filtering (appcfg.py and bulkloader.py).
It seems reasonable to do one of two things:
write a utility (select+export+save-to-local-file) and execute it locally accessing remotely GAE datastore in remote api shell
write a admin web function for select+export+zip - new url in handler + upload to GAE + call-it-using-http

In the python Google App Engine, how do I export all the entities of a model to a file in Google Storage for developers?

I have about 900K entities of a model in python GAE that I would like to export to a CSV file for offline testing. I can use the appcfg.py download_data option, but in this case I don't want to backup to local machine. I'd like a faster way to create the file in GAE, save it to Google Storage or elsewhere, and download it later from multiple machines.
I'm assuming that I will need to do this in a task since it will likely take more than 30 seconds for the operation to complete.
class MyModel(db.model):
foo = db.StringProperty(required=True)
bar = db.StringProperty(required=True)
def backup_mymodel_to_file():
#What to do here?
Your best option will be to use map reduce library to export the relevant data to the blobstore, then upload the completed file to Google Storage.
Note that integration between Google Storage and App Engine is a work in progress.
I know this is old, but I posted an example of using the App Engine Mapper API dumping datastore data into Cloud Storage here:
Google App Engine: Using Big Query on datastore?

Categories