How to migrate Google App Engine Project to Compute Engine completely? - python

We have been using Google App Engine for the backend services of a project which has been developed completely as a Gooogle App Engine Project.
Lately, the front end instances were consuming more than 60-70% of our project expense. Thus we decided to do away with it completely and migrate to Google Compute Engine instead.
Wanted to know if anyone has migrated their GAE project to GCE. I understand that GCE VMs could be dynamically spun up from within a GAE app, but we want to completely do away with GAE. (Source)
As a last option, I shall host a Django project and use GAE files as the controller for the web services.
However, wanted to know if there are other potentially easier options for moving GAE projects to GCE while keeping the datastore integration intact.
TIA

Unfortunately the uniqueness of the standard environment support for the application may make your migration quite difficult.
Take, for example, the significant differences just between the standard env and the flexible env (which, if you want, would be like an intermediary step towards total migration to GCE): Migrating Services from the Standard Environment to the Flexible Environment. To me they're practically different beasts.
To make matters worse the very thing you consider the most important in your migration - keeping the datastore integration intact - is also the most likely to stand against your migration.
That's because chances are that your app uses one of the dedicated client libraries, optimized for and only available to the standard environment GAE apps. If so - the migration effectively means re-designing the entire interaction with the datastore to make it use one of the more generic datastore libraries instead. Which means more than just translating API calls - there are conceptual and functional differences that would need to be addressed.
So the answer to the title question may very well be: redesign your app for GCE. Personally I'm unsure if GCE is overall more cost-effective - I still prefer the standard env GAE. Assuming at some point the costs go up enough to maybe re-consider, I'd:
take a closer look at the pricing and the current app costs breakdown, to see which components are the heavier ones: if the majority of the costs come, for example, from the datastore usage - I wouldn't expect a migration to GCE to significantly help
try to tune the app's config and/or code to reduce costs: for example if the instance hours represent the majority in the costs tuning the scalability configurations depending on the actual traffic patterns might lower the bill
estimate the costs for similar usage patterns but with the corresponding components available on GCE (and/or GAE flex)
if the respective components are also available on GAE flex I'd make some experiments using that instead of going full GCE (which would pretty much require the re-write first).
A gradual transition using the flexible environment as a stepping stone could reveal if the estimated costs savings aren't quite there, thus helping drop the whole transition before doing the entire re-write. And also could help with the re-write, in case the transition still remains a "go".
Update: There might be another solution to consider for reducing costs: running the existing GAE app code through AppScale (see also appscale) on a more cost-effective IaaS provider.

Related

what are the cost/perfomance advantages of using "golang" in GAE

In terms of the Quotas/Usage limits per instance, is there any considerable improvement/advantage when using golang in Google appengine GAE instead of other offered language that run within GAE like python, java,php or all of them behave the same?
Or basically any instance no matter the language in use, behave the same way and can handle barely the same amount of maximum requests/sec per instance considering that this concerns more to the "GAE load balancer" or infrastructure, rather than the used programming language, same logic could applied to the memory,cpu usage?
App Engine doesn't have explicit limits or restrictions that would apply only when using a specific language. However the languages and their technologies might imply certain limitations, for example a Java Virtual Machine instance by itself requires significantly more memory and has significantly higher startup time (even when warmup requests are enabled) than starting the built-in web server of Go, so in case of a Java instance less memory will remain for the webapp itself to allocate and use (for a specific plan/type and instance).
I don't have concrete measures to compare, but (in case of Go):
"Code is deployed in source form and compiled in the cloud... Go is the first true compiled language that runs on App Engine. Go on App Engine makes it possible to deploy efficient, CPU-intensive web applications". (source)
If you think about it, other languages at App Engine are all interpreted (including Java which is byte code interpreted by a Virtual Machine) while Go is compiled into and runs as platform dependent native code. This should already tell something about performance.
For a "case-study" check out the following blog post:
From zero to Go: launching on the Google homepage in 24 hours
This blog also contains some performance report of a real-world app used by millions:
This chart - taken directly from the App Engine dashboard - shows average request latency during launch. As you can see, even under load it never exceeds 60 ms, with a median latency of 32 milliseconds. This is wicked fast, considering that our request handler is doing image manipulation and encoding on the fly.
App Engine uses the web server that is included in the Go standard library to serve your app, so that also means you can easily port a Go web app to App Engine, and that you know exactly what to expect from the web server serving your app on App Engine.
Found Official time comparisions of Python, Java and Go
The App Engine System Status can be considered official and a good comparision base.
You can click on any cells belonging to a specific day and language, and you get detailed historical statistics for Static and Dynamic GET latency (both secure and unsecure), Error rates, CPU usage/latency. These statistics are measured on an instance that is already up and ready to serve.
Analysing it for the day of January 27, 2015 here are the conclusions for Go, Java and Python:
Dynamic latency is roughly the same for all
CPU latency (to compute the 33rd Fibonacci number) is best for Java, then Go and slowest is Python.
Static file serving time is roughly the same but Go is fastest.

Python bottle vs uwsgi/bottle vs nginx/uwsgi/bottle

I am developing a Python based application (HTTP -- REST or jsonrpc interface) that will be used in a production automated testing environment. This will connect to a Java client that runs all the test scripts. I.e., no need for human access (except for testing the app itself).
We hope to deploy this on Raspberry Pi's, so I want it to be relatively fast and have a small footprint. It probably won't get an enormous number of requests (at max load, maybe a few per second), but it should be able to run and remain stable over a long time period.
I've settled on Bottle as a framework due to its simplicity (one file). This was a tossup vs Flask. Anybody who thinks Flask might be better, let me know why.
I have been a bit unsure about the stability of Bottle's built-in HTTP server, so I'm evaluating these three options:
Use Bottle only -- As http server + App
Use Bottle on top of uwsgi -- Use uwsgi as the HTTP server
Use Bottle with nginx/uwsgi
Questions:
If I am not doing anything but Python/uwsgi, is there any reason to add nginx to the mix?
Would the uwsgi/bottle (or Flask) combination be considered production-ready?
Is it likely that I will gain anything by using a separate HTTP server from Bottle's built-in one?
Flask vs Bottle comes down to a couple of things for me.
How simple is the app. If it is very simple, then bottle is my choice. If not, then I got with Flask. The fact that bottle is a single file makes it incredibly simple to deploy with by just including the file in our source. But the fact that bottle is a single file should be a pretty good indication that it does not implement the full wsgi spec and all of its edge cases.
What does the app do. If it is going to have to render anything other than Python->JSON then I go with Flask for its built in support of Jinja2. If I need to do authentication and/or authorization then Flask has some pretty good extensions already for handling those requirements. If I need to do caching, again, Flask-Cache exists and does a pretty good job with minimal setup. I am not entirely sure what is available for bottle extension-wise, so that may still be worth a look.
The problem with using bottle's built in server is that it will be single process / single thread which means you can only handle processing one request at a time.
To deal with that limitation you can do any of the following in no particular order.
Eventlet's wsgi wrapping the bottle.app (single threaded, non-blocking I/O, single process)
uwsgi or gunicorn (the latter being simpler) which is most ofter set up as single threaded, multi-process (workers)
nginx in front of uwsgi.
3 is most important if you have static assets you want to serve up as you can serve those with nginx directly.
2 is really easy to get going (esp. gunicorn) - though I use uwsgi most of the time because it has more configurability to handle some things that I want.
1 is really simple and performs well... plus there is no external configuration or command line flags to remember.
2017 UPDATE - We now use Falcon instead of Bottle
I still love Bottle, but we reached a point last year where it couldn't scale to meet our performance requirements (100k requests/sec at <100ms). In particular, we hit a performance bottleneck with Bottle's use of thread-local storage. This forced us to switch to Falcon, and we haven't looked back since. Better performance and a nicely designed API.
I like Bottle but I also highly recommend Falcon, especially where performance matters.
I faced a similar choice about a year ago--needed a web microframework for a server tier I was building out. Found these slides (and the accompanying lecture) to be very helpful in sifting through the field of choices: Web micro-framework BATTLE!
I chose Bottle and have been very happy with it. It's simple, lightweight (a plus if you're deploying on Raspberry Pis), easy to use, intuitive, has the features I need, and has been supremely extensible whenever I've needed to add features of my own. Many plugins are available.
Don't use Bottle's built-in HTTP server for anything but dev.
I've run Bottle in production with a lot of success; it's been very stable on Apache/mod_wsgi. nginx/uwsgi "should" work similarly but I don't have experience with it.
I also suggest you look at running bottle via gevent.pywsgi server. It's awesome, super simple to setup, asynchronous, and very fast.
Plus bottle has an adapter built for it already, so even easier.
I love bottle, and this concept that it is not meant for large projects is ridiculous. It's one of the most efficient and well written frameworks, and can be easily molded without a lot of hand wringing.

Durable architecture in Python distributed application

Wondering about durable architectures for distributed Python applications.
This question I asked before should provide a little guidance about the sort of application it is. We would like to have the ability to have several code servers and several database servers, and ideally some method of deployment that is manageable and not too much of a pain.
The question I mentioned provides an answer that I like, but I wonder how it could be made more durable, or if doing so requires using other technologies. In particular:
I would have my frontend endpoints be the WSGI (because you already have that written) and write the backend to be distributed via messages. Then you would have a pool of backend nodes that would pull messages off of the Celery queue and complete the required work. It would look sort of like:
Apache -> WSGI Containers -> Celery Message Queue -> Celery Workers.
The apache nodes would be behind a load balancer of some kind. This would be a fairly simple architecture to scale and is, if done correctly, fairly reliable. Code for failure in a system like this and you will be fine.
What is the best way to make durable applications? Any suggestions on how to either "code for failure" or design it differently so that we don't necessarily have to? If you think Python might not be suited for this, that is also a valid solution.
Well to continue on the previous answer I gave.
In my projects I code for failure, because I use AWS for a lot of my hosting needs.
I have implemented database backends that will make sure that the database, region, is accessible and if not it will choose another region from a specified list. This happens transparently to the rest of the system on that node. So, if the east-1a region goes down I have a few other regions that I also host in that it will failover into, such as the west coast. I keep track of currently going database transactions and send them over to the west coast and dump them to a file so I can import them into the old database region once it becomes available.
My front end servers sit behind a elastic load balancer that is distributed across multiple regions and this allows for durable recovery if a region fails. But, it cannot be relied upon so I am looking into solutions such as running a HAProxy and switching my DNS in the case that my ELB goes down. This is a work in progress and I cannot give specifics on my own solutions.
To make your data processing durable look into Celery and store the data in a distributed mongo server to keep your results safe. Using a durable data store to keep your results allows you to get them back in the event of a node crash. It comes at the cost of some performance, but it shouldn't be too terrible if you only rely on soft-realtime constraints.
http://www.mnxsolutions.com/amazon/designing-for-failure-with-amazon-web-services.html
The above article talks mostly about AWS but the ideas apply to any system that you need to keep high availability in and system durability. Just remember that downtime is ok as long as you minimize it for a subset of users.

What are some of the best practices to do rolling restarts for Django servers to ensure minimum problems?

When doing rolling restarts, some servers are still running the old code while some are being restarted with the new code. If you have a large number of machines/processes, there might be significant delay between the first server and the last server.
This can be a problem when there are changes to the database schema, such as columns get renamed, tables removed and etc. And this would mean that the old code (e.g using previous column names, or old tables) is still being used before the rolling restart is done.
I wonder if Django provides any guarantees or conventions to make this work well. From my own observation, when adding new models (tables) and new fields (columns) in Django this doesn't seem to cause issues with old code, because the old code doesn't even know it exists and doesn't care.
Are there any best practices or conventions in Django that one should follow to ensure minimum problems when doing a rolling restart?
There are infinite ways to deploy a Django application - each web platform stack may have a distinct set of "best practices". That said, the 12 factors are good design principles for modern web applications.
So far, I have deployed Django using:
linux + apache + mod_wsgi
linux + nginx + uwsgi
I prefer to configure the server to reload the application when I touch some file (usually a blank file named "reload.me").
In my experience it is a non-issue, that is why you may not find much information about it.

Faster App Engine Development Datastore Alternative

Is there a way to use a real database(SQLite, Mysql, or even some non-relational one) as datastore for development, instead of memory/file datastore that is provided.
I saw few projects, GAE-SQLite(did not seem to be working) and one tip about accessing production datastore using remote api (still pretty slow for large datasets).
MongoDB works great for that. You will need:
The MongoDB stub: http://github.com/mongodb/mongo-appengine-connector
MongoDB: http://www.mongodb.org/display/DOCS/Downloads
Some code to set it up like:
code:
import datastore_mongo_stub
os.environ['APPLICATION_ID'] = 'test'
datastore = datastore_mongo_stub.DatastoreMongoStub(
os.environ['APPLICATION_ID'], 'woot', '', require_indexes=False)
apiproxy_stub_map.apiproxy.RegisterStub('datastore_v3', datastore)
But if you're looking for truly faster development (like I was) the datastore is actually not the issue as much is the single threaded web server. I tried to replace it with spawning but that was a little too hard. You could also try to set up TyphoonAE which will mimic the appengine stack with open alternatives.
Be aware that if you do any of these, you might lose some of the exact behavior the current tools provide, meaning that if you deploy you could get results you didn't expect. In other words; make sure you know what you're doing :-)
The Google App Engine SDK for Python now bundles support for SQLite. See the official docs for more information.
bdbdatastore is an alternative datastore backend that's considerably better than the one built in to the development server, although the datastore is far from being the only problem with the dev server when it comes to handling large applications.

Categories