Faster App Engine Development Datastore Alternative

Faster App Engine Development Datastore Alternative - python

Is there a way to use a real database(SQLite, Mysql, or even some non-relational one) as datastore for development, instead of memory/file datastore that is provided.
I saw few projects, GAE-SQLite(did not seem to be working) and one tip about accessing production datastore using remote api (still pretty slow for large datasets).

MongoDB works great for that. You will need:
The MongoDB stub: http://github.com/mongodb/mongo-appengine-connector
MongoDB: http://www.mongodb.org/display/DOCS/Downloads
Some code to set it up like:
code:
import datastore_mongo_stub
os.environ['APPLICATION_ID'] = 'test'
datastore = datastore_mongo_stub.DatastoreMongoStub(
os.environ['APPLICATION_ID'], 'woot', '', require_indexes=False)
apiproxy_stub_map.apiproxy.RegisterStub('datastore_v3', datastore)
But if you're looking for truly faster development (like I was) the datastore is actually not the issue as much is the single threaded web server. I tried to replace it with spawning but that was a little too hard. You could also try to set up TyphoonAE which will mimic the appengine stack with open alternatives.
Be aware that if you do any of these, you might lose some of the exact behavior the current tools provide, meaning that if you deploy you could get results you didn't expect. In other words; make sure you know what you're doing :-)

The Google App Engine SDK for Python now bundles support for SQLite. See the official docs for more information.

bdbdatastore is an alternative datastore backend that's considerably better than the one built in to the development server, although the datastore is far from being the only problem with the dev server when it comes to handling large applications.

Related

Does python with wsgi (uwsgi) under nginx have some small default cache?

In my small web-site I feel need to make some data widely available, to avoid exchanging with database for every request made. E.g. this could be the list of current users show in the bottom of every page or the time of last update of ranking.
The stuff works in Python (Flask) running upon nginx + uwsgi (this docker image).
I wonder, do I have some small cache or shared memory for keeping such information "out of the box", or I need to take care of explicitly setting up some dedicated cache? Or perhaps some thing like this is provided by nginx?
alternatively I still can use database for it has its own cache I think, anyway
Sorry if question seems to be naive/silly - for I come from java world (where things a bit different as we serve all requests with one fat instance of java application) - and have some difficulty grasping what powers does wsgi/uwsgi provide. Thanks in advance!

Firstly, nginx has cache:
https://www.nginx.com/blog/nginx-caching-guide/
But for flask cacheing you also have options:
https://pythonhosted.org/Flask-Cache/
http://flask.pocoo.org/docs/1.0/patterns/caching/

Did you have a look at caching section from Flask docs?
It literally says:
Flask itself does not provide caching for you, but Werkzeug, one of the libraries it is based on, has some very basic cache support
You create a cache object once and keep it around, similar to how Flask objects are created. If you are using the development server you can create a SimpleCache object, that one is a simple cache that keeps the item stored in the memory of the Python interpreter:
from werkzeug.contrib.cache import SimpleCache
cache = SimpleCache()
-- UPDATE --
Or you could solve on the frontend side storing data in the web browser local storage.
If there's nothing in the local storage you call the DB, else you use the information from local storage rather than making db call.
Hope it helps.

How to migrate Google App Engine Project to Compute Engine completely?

We have been using Google App Engine for the backend services of a project which has been developed completely as a Gooogle App Engine Project.
Lately, the front end instances were consuming more than 60-70% of our project expense. Thus we decided to do away with it completely and migrate to Google Compute Engine instead.
Wanted to know if anyone has migrated their GAE project to GCE. I understand that GCE VMs could be dynamically spun up from within a GAE app, but we want to completely do away with GAE. (Source)
As a last option, I shall host a Django project and use GAE files as the controller for the web services.
However, wanted to know if there are other potentially easier options for moving GAE projects to GCE while keeping the datastore integration intact.
TIA

Unfortunately the uniqueness of the standard environment support for the application may make your migration quite difficult.
Take, for example, the significant differences just between the standard env and the flexible env (which, if you want, would be like an intermediary step towards total migration to GCE): Migrating Services from the Standard Environment to the Flexible Environment. To me they're practically different beasts.
To make matters worse the very thing you consider the most important in your migration - keeping the datastore integration intact - is also the most likely to stand against your migration.
That's because chances are that your app uses one of the dedicated client libraries, optimized for and only available to the standard environment GAE apps. If so - the migration effectively means re-designing the entire interaction with the datastore to make it use one of the more generic datastore libraries instead. Which means more than just translating API calls - there are conceptual and functional differences that would need to be addressed.
So the answer to the title question may very well be: redesign your app for GCE. Personally I'm unsure if GCE is overall more cost-effective - I still prefer the standard env GAE. Assuming at some point the costs go up enough to maybe re-consider, I'd:
take a closer look at the pricing and the current app costs breakdown, to see which components are the heavier ones: if the majority of the costs come, for example, from the datastore usage - I wouldn't expect a migration to GCE to significantly help
try to tune the app's config and/or code to reduce costs: for example if the instance hours represent the majority in the costs tuning the scalability configurations depending on the actual traffic patterns might lower the bill
estimate the costs for similar usage patterns but with the corresponding components available on GCE (and/or GAE flex)
if the respective components are also available on GAE flex I'd make some experiments using that instead of going full GCE (which would pretty much require the re-write first).
A gradual transition using the flexible environment as a stepping stone could reveal if the estimated costs savings aren't quite there, thus helping drop the whole transition before doing the entire re-write. And also could help with the re-write, in case the transition still remains a "go".
Update: There might be another solution to consider for reducing costs: running the existing GAE app code through AppScale (see also appscale) on a more cost-effective IaaS provider.

Python bottle vs uwsgi/bottle vs nginx/uwsgi/bottle

I am developing a Python based application (HTTP -- REST or jsonrpc interface) that will be used in a production automated testing environment. This will connect to a Java client that runs all the test scripts. I.e., no need for human access (except for testing the app itself).
We hope to deploy this on Raspberry Pi's, so I want it to be relatively fast and have a small footprint. It probably won't get an enormous number of requests (at max load, maybe a few per second), but it should be able to run and remain stable over a long time period.
I've settled on Bottle as a framework due to its simplicity (one file). This was a tossup vs Flask. Anybody who thinks Flask might be better, let me know why.
I have been a bit unsure about the stability of Bottle's built-in HTTP server, so I'm evaluating these three options:
Use Bottle only -- As http server + App
Use Bottle on top of uwsgi -- Use uwsgi as the HTTP server
Use Bottle with nginx/uwsgi
Questions:
If I am not doing anything but Python/uwsgi, is there any reason to add nginx to the mix?
Would the uwsgi/bottle (or Flask) combination be considered production-ready?
Is it likely that I will gain anything by using a separate HTTP server from Bottle's built-in one?

Flask vs Bottle comes down to a couple of things for me.
How simple is the app. If it is very simple, then bottle is my choice. If not, then I got with Flask. The fact that bottle is a single file makes it incredibly simple to deploy with by just including the file in our source. But the fact that bottle is a single file should be a pretty good indication that it does not implement the full wsgi spec and all of its edge cases.
What does the app do. If it is going to have to render anything other than Python->JSON then I go with Flask for its built in support of Jinja2. If I need to do authentication and/or authorization then Flask has some pretty good extensions already for handling those requirements. If I need to do caching, again, Flask-Cache exists and does a pretty good job with minimal setup. I am not entirely sure what is available for bottle extension-wise, so that may still be worth a look.
The problem with using bottle's built in server is that it will be single process / single thread which means you can only handle processing one request at a time.
To deal with that limitation you can do any of the following in no particular order.
Eventlet's wsgi wrapping the bottle.app (single threaded, non-blocking I/O, single process)
uwsgi or gunicorn (the latter being simpler) which is most ofter set up as single threaded, multi-process (workers)
nginx in front of uwsgi.
3 is most important if you have static assets you want to serve up as you can serve those with nginx directly.
2 is really easy to get going (esp. gunicorn) - though I use uwsgi most of the time because it has more configurability to handle some things that I want.
1 is really simple and performs well... plus there is no external configuration or command line flags to remember.

2017 UPDATE - We now use Falcon instead of Bottle
I still love Bottle, but we reached a point last year where it couldn't scale to meet our performance requirements (100k requests/sec at <100ms). In particular, we hit a performance bottleneck with Bottle's use of thread-local storage. This forced us to switch to Falcon, and we haven't looked back since. Better performance and a nicely designed API.
I like Bottle but I also highly recommend Falcon, especially where performance matters.
I faced a similar choice about a year ago--needed a web microframework for a server tier I was building out. Found these slides (and the accompanying lecture) to be very helpful in sifting through the field of choices: Web micro-framework BATTLE!
I chose Bottle and have been very happy with it. It's simple, lightweight (a plus if you're deploying on Raspberry Pis), easy to use, intuitive, has the features I need, and has been supremely extensible whenever I've needed to add features of my own. Many plugins are available.
Don't use Bottle's built-in HTTP server for anything but dev.
I've run Bottle in production with a lot of success; it's been very stable on Apache/mod_wsgi. nginx/uwsgi "should" work similarly but I don't have experience with it.

I also suggest you look at running bottle via gevent.pywsgi server. It's awesome, super simple to setup, asynchronous, and very fast.
Plus bottle has an adapter built for it already, so even easier.
I love bottle, and this concept that it is not meant for large projects is ridiculous. It's one of the most efficient and well written frameworks, and can be easily molded without a lot of hand wringing.

Python Web Backend

I am an experienced Python developer starting to work on web service
backend system. The system feeds data (constantly) from the web to a
MySQL database. This data is later displayed by a frontend side (there
is no connection between the frontend and the backend). The backend
system constantly downloads flight information from the web (some of
the data is fetched via APIs, and some by downloading and parsing
text / xls files). I already have a script that downloads the data,
parses it, and inserts it to the MySQL db - all in a big loop. The
frontend side is just a bunch of php pages that properly display the
data by querying the MySQL server.
It is crucial that this web service be robust, strong and reliable.
Therefore, I have been looking into the proper ways to design it, and came across the following parts to comprise my system:
1) django as a framework (for HTTP connections and for using Piston)
2) Piston as an API provider (this is great because then my front-end can use the API instead of actually running queries)
3) SQLAlchemy as the DB layer (I don't like the little control you get when using django ORM, I want to be able to run a more complex DB framework)
4) Apache with mod_wsgi to run everything
5) And finally, Celery (or django-cron) to actually run my infinite loop that pulls the data off the web - hopefully in some sort of organized tasks format). This is the part I am least sure of, and any pointers are appreciated.
This all sounds great. I used django before to write websites (aka
request handlers that return data). However, other than using Celery or django-cron I can't really see how it fits a role of a constant data feeding backend.
I just wanted to run this by you guys to hear your ideas / comments. Any input you have / pointers to documentation and/or other libraries would be greatly greatly appreciated!

If You are about to use SQLAlchemy, I would refrain from using Django: Django is fine if You are using the whole stack, but as You are about to rip Models off, I do not see much value in using it and I would take a look at another option (perhaps Pylons or pure old CherryPy would do).
Even more so if FEs will not run queries, but only ask API providers.
As for robustness, I am more satisfied with starting separate fcgi processess with supervise and using more lightweight web server (ligty / nginx), but that's a matter of taste.
For the "infinite loop" part, it depends on what behavior you want: if there is a problem with the source, would you just like to skip the step or repeat it multiple times when source is back up?
Periodic Tasks might be good for former, while cron that would just spawn scraping tasks is better for latter.

Database for Python Twisted

There's an API for Twisted apps to talk to a database in a scalable way: twisted.enterprise.dbapi
The confusing thing is, which database to pick?
The database will have a Twisted app that is mostly making inserts and updates and relatively few selects, and then other strictly-read-only clients that are accessing the database directly making selects.
(The read-only users are not necessarily selecting the data that the Twisted app is inserting; its not as though the database is being used as a message-queue)
My understanding - which I'd like corrected/adviced - is that:
Postgres is a great DB, but almost all the Python bindings - and there is a confusing maze of them - are abandonware
There is psycopg2 for postgres, but that makes a lot of noise about doing its own connection-pooling and things; does this co-exist gracefully/usefully/transparently with the Twisted async database connection pooling and such?
SQLLite is a great database for little things but if used in a multi-user way it does whole-database locking, so performance would suck in the usage pattern I envisage; it also has different mechanisms for typing column values?
MySQL - after the Oracle takeover, who'd want to adopt it now or adopt a fork?
Is there anything else out there?

Scalability
twisted.enterprise.adbapi isn't necessarily an interface for talking to databases in a scalable way. Scalability is a problem you get to solve separately. The only thing twisted.enterprise.adbapi really claims to do is let you use DB-API 2.0 modules without the blocking that normally implies.
Postgres
Yes. This is the correct answer. I don't think all of the Python bindings are abandonware - psycopg2, for example, seems to be actively maintained. In fact, they just added some new bindings for async access which Twisted might eventually offer an interface.
SQLite3 is pretty cool too. You might want to make it possible to use either Postgres or SQLite3 in your app; your unit tests will definitely be happier running against SQLite3, for example, even if you want to deploy against Postgres.
Other?
It's hard to know if another database entirely (something non-relational, perhaps) would fit your application better than Postgres. That depends a lot on the specific data you're going to be storing and the queries you need to run against it. If there are interesting relationships in your database, Postgres does seem like a pretty good answer. If all your queries look like "SELECT foo, bar FROM baz" though, there might be a simpler, higher performance option.

There is the txpostgres library which is a drop in replacement for twisted.enterprise.dbapi, —instead of a thread pool and blocking DB IO, it is fully asynchronous, leveraging the built in async capabilities of psycopg2.
We are using it in production in a big corporation and it's been serving us very well so far. Also, it's actively developed—a bug we reported recently was solved very quickly.
http://pypi.python.org/pypi/txpostgres
https://github.com/wulczer/txpostgres

You could look at nosql databases like mongodb or couchdb with twisted.
Scaling out could be rather easier with nosql based databases than with mysql or postgres.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.