How Google App Engine limit Python? - python

Does anybody know, how GAE limit Python interpreter? For example, how they block IO operations, or URL operations.
Shared hosting also do it in some way?

The sandbox "internally works" by them having a special version of the Python interpreter. You aren't running the standard Python executable, but one especially modified to run on Google App engine.
Update:
And no it's not a virtual machine in the ordinary sense. Each application does not have a complete virtual PC. There may be some virtualization going on, but Google isn't saying exactly how much or what.
A process has normally in an operating system already limited access to the rest of the OS and the hardware. Google have limited this even more and you get an environment where you are only allowed to read the very specific parts of the file system, and not write to it at all, you are not allowed to open sockets and not allowed to make system calls etc.
I don't know at which level OS/Filesystem/Interpreter each limitation is implemented, though.

From Google's site:
An application can only access other
computers on the Internet through the
provided URL fetch and email
services. Other computers can only
connect to the application by making
HTTP (or HTTPS) requests on the
standard ports.
An application cannot write to the
file system. An app can read files,
but only files uploaded with the
application code. The app must use
the App Engine datastore, memcache or
other services for all data that
persists between requests.
Application code only runs in
response to a web request, a queued
task, or a scheduled task, and must
return response data within 30
seconds in any case. A request
handler cannot spawn a sub-process or
execute code after the response has
been sent.
Beyond that, you're stuck with Python 2.5, you can't use any C-based extensions, more up-to-date versions of web frameworks won't work in some cases (Python 2.5 again).
You can read the whole article What is Google App Engine?.

I found this site
that has some pretty decent information. What exactly are you trying to do?
Here
FRESH!
Look here: http://code.google.com/appengine/docs/python/runtime.html
Your IO Operations are limited as follows (beyond disabled modules):
App Engine records how much of each resource an application uses in a calendar day, and considers the resource depleted when this amount reaches the app's quota for the resource. A calendar day is a period of 24 hours beginning at midnight, Pacific Time. App Engine resets all resource measurements at the beginning of each day, except for Stored Data which always represents the amount of datastore storage in use.
When an app consumes all of an allocated resource, the resource becomes unavailable until the quota is replenished. This may mean that your app will not work until the quota is replenished.
An application can determine how much CPU time the current request has taken so far by calling the Quota API. This is useful for profiling CPU-intensive code, and finding places where CPU efficiency can be improved for greater cost savings. You can measure the CPU used for the entire request, or call the API before and after a section of code then subtract to determine the CPU used between those two points.
Resource| Free Default Quota| Billing Enabled Default Quota
Blobstore |Stored Data| 1 GB| 1 GB free; no maximum
Resource |Billing Enabled| Default Quota
Daily Limit| Maximum Rate
Blobstore API Calls |140,000,000 calls| 72,000 calls/minute
Hmm my table isn't that good, but hopefully still readable.
EDIT: OK, I understand. But sir, you did not have to use the "f" word. :) And you know, it's kinda like the whole 'teach a man to fish' scenario. Google is who I always ask and that's why I'm answering questions here for fun.
EDIT AGAIN: OK that made more sense before the comment was tooked. So I went and answered the question a little more. I hope it helps.

IMO it's not a standard python, but a version specifically patched for app engine. In other words you can think more or less like an "higher level" VM that however is not emulating x86 instructions but python opcodes (if you don't know what they are try writing a small function named "foo" and the doing "import dis; dis.dis(foo)" you will see the python opcodes that the compiler produced).
By patching python you can impose to it whatever limitations you like. Of course you've however to forbid the use of user supplied C/C++ extension modules as a C/C++ module will have access to everything the process can access.
Using such a virtual environment you're able to run safely python code without the need to use a separate x86 VM for every instance.

Related

How to integrate BIRT with Python Django Project by using Py4j

Hi is there anyone who is help me to Integrate BIRT report with Django Projects? or any suggestion for connect third party reporting tools with Django like Crystal or Crystal Clear Report.
Some of the 3rd-party Crystal Reports viewers listed here provide a full command line API, so your python code can preview/export/print reports via subprocess.call()
The resulting process can span anything between an interactive Crystal Report viewer session (user can login, set/change parameters, print, export) and an automated (no user interaction) report printing/exporting.
While this would simplify your code, it would restrict deployment to Windows.
For prototyping, or if you don't mind performance, you can call from BIRT from the command line.
For example, download the POJO runtime and use the script genReport.bat (IIRC) to generate a report to a file (eg. PDF format). You can specify the output options and the report parameters on the command line.
However, the BIRT startup is heavy overhead (several seconds).
For achieving reasonable performance, it is much better to perform this only once.
To achieve this goal, there are at least two possible ways:
You can use the BIRT viewer servlet (which is included as a WAR file with the POJO runtime). So you start the servlet with a web server, then you use HTTP requests to generate reports.
This looks technically old-fashioned (eg. no JSON Requests), but it should work. However, I never used this approach.
The other option is to write your own BIRT server.
In our product, we followed this approach.
You can take the viewer servlet as a template for seeing how this could work.
The basic idea is:
You start one (or possibly more than one) Java process.
The Java process initializes the BIRT runtime (this is what takes some seconds).
After that, the Java process listens for requests somehow (we used a plain socket listener, but of course you could use HTTP or some REST server framework as well).
A request would contain the following information:
which module to run
which output format
report parameters (specific to the module)
possibly other data/metadata, e.g. for authentication
This would create a RunAndRenderTask or separate RunTask and RenderTasks.
Depending on your reports, you might consider returning the resulting output (e.g. PDF) directly as a response, or using an asynchronous approach.
Note that BIRT will happily create several reports at the same time - multi-threading is no problem (except for the initialization), given enough RAM.
Be warned, however, that you will need at least a few days to build a POC for this "create your own server" approach, and probably some weeks for prodction quality.
So if you just want to build something fast to see if the right tool for you, you should start with the command line approach, then the servlet approach and only then, and only if you find that the servlet approach is not quite good enough, you should go the "create your own server" way.
It's a pity that currently there doesn't seem to exist an open-source, production-quality, modern BIRT REST service.
That would make a really good contribution to the BIRT open-source project... (https://github.com/eclipse/birt)

what are the cost/perfomance advantages of using "golang" in GAE

In terms of the Quotas/Usage limits per instance, is there any considerable improvement/advantage when using golang in Google appengine GAE instead of other offered language that run within GAE like python, java,php or all of them behave the same?
Or basically any instance no matter the language in use, behave the same way and can handle barely the same amount of maximum requests/sec per instance considering that this concerns more to the "GAE load balancer" or infrastructure, rather than the used programming language, same logic could applied to the memory,cpu usage?
App Engine doesn't have explicit limits or restrictions that would apply only when using a specific language. However the languages and their technologies might imply certain limitations, for example a Java Virtual Machine instance by itself requires significantly more memory and has significantly higher startup time (even when warmup requests are enabled) than starting the built-in web server of Go, so in case of a Java instance less memory will remain for the webapp itself to allocate and use (for a specific plan/type and instance).
I don't have concrete measures to compare, but (in case of Go):
"Code is deployed in source form and compiled in the cloud... Go is the first true compiled language that runs on App Engine. Go on App Engine makes it possible to deploy efficient, CPU-intensive web applications". (source)
If you think about it, other languages at App Engine are all interpreted (including Java which is byte code interpreted by a Virtual Machine) while Go is compiled into and runs as platform dependent native code. This should already tell something about performance.
For a "case-study" check out the following blog post:
From zero to Go: launching on the Google homepage in 24 hours
This blog also contains some performance report of a real-world app used by millions:
This chart - taken directly from the App Engine dashboard - shows average request latency during launch. As you can see, even under load it never exceeds 60 ms, with a median latency of 32 milliseconds. This is wicked fast, considering that our request handler is doing image manipulation and encoding on the fly.
App Engine uses the web server that is included in the Go standard library to serve your app, so that also means you can easily port a Go web app to App Engine, and that you know exactly what to expect from the web server serving your app on App Engine.
Found Official time comparisions of Python, Java and Go
The App Engine System Status can be considered official and a good comparision base.
You can click on any cells belonging to a specific day and language, and you get detailed historical statistics for Static and Dynamic GET latency (both secure and unsecure), Error rates, CPU usage/latency. These statistics are measured on an instance that is already up and ready to serve.
Analysing it for the day of January 27, 2015 here are the conclusions for Go, Java and Python:
Dynamic latency is roughly the same for all
CPU latency (to compute the 33rd Fibonacci number) is best for Java, then Go and slowest is Python.
Static file serving time is roughly the same but Go is fastest.

Python bottle vs uwsgi/bottle vs nginx/uwsgi/bottle

I am developing a Python based application (HTTP -- REST or jsonrpc interface) that will be used in a production automated testing environment. This will connect to a Java client that runs all the test scripts. I.e., no need for human access (except for testing the app itself).
We hope to deploy this on Raspberry Pi's, so I want it to be relatively fast and have a small footprint. It probably won't get an enormous number of requests (at max load, maybe a few per second), but it should be able to run and remain stable over a long time period.
I've settled on Bottle as a framework due to its simplicity (one file). This was a tossup vs Flask. Anybody who thinks Flask might be better, let me know why.
I have been a bit unsure about the stability of Bottle's built-in HTTP server, so I'm evaluating these three options:
Use Bottle only -- As http server + App
Use Bottle on top of uwsgi -- Use uwsgi as the HTTP server
Use Bottle with nginx/uwsgi
Questions:
If I am not doing anything but Python/uwsgi, is there any reason to add nginx to the mix?
Would the uwsgi/bottle (or Flask) combination be considered production-ready?
Is it likely that I will gain anything by using a separate HTTP server from Bottle's built-in one?
Flask vs Bottle comes down to a couple of things for me.
How simple is the app. If it is very simple, then bottle is my choice. If not, then I got with Flask. The fact that bottle is a single file makes it incredibly simple to deploy with by just including the file in our source. But the fact that bottle is a single file should be a pretty good indication that it does not implement the full wsgi spec and all of its edge cases.
What does the app do. If it is going to have to render anything other than Python->JSON then I go with Flask for its built in support of Jinja2. If I need to do authentication and/or authorization then Flask has some pretty good extensions already for handling those requirements. If I need to do caching, again, Flask-Cache exists and does a pretty good job with minimal setup. I am not entirely sure what is available for bottle extension-wise, so that may still be worth a look.
The problem with using bottle's built in server is that it will be single process / single thread which means you can only handle processing one request at a time.
To deal with that limitation you can do any of the following in no particular order.
Eventlet's wsgi wrapping the bottle.app (single threaded, non-blocking I/O, single process)
uwsgi or gunicorn (the latter being simpler) which is most ofter set up as single threaded, multi-process (workers)
nginx in front of uwsgi.
3 is most important if you have static assets you want to serve up as you can serve those with nginx directly.
2 is really easy to get going (esp. gunicorn) - though I use uwsgi most of the time because it has more configurability to handle some things that I want.
1 is really simple and performs well... plus there is no external configuration or command line flags to remember.
2017 UPDATE - We now use Falcon instead of Bottle
I still love Bottle, but we reached a point last year where it couldn't scale to meet our performance requirements (100k requests/sec at <100ms). In particular, we hit a performance bottleneck with Bottle's use of thread-local storage. This forced us to switch to Falcon, and we haven't looked back since. Better performance and a nicely designed API.
I like Bottle but I also highly recommend Falcon, especially where performance matters.
I faced a similar choice about a year ago--needed a web microframework for a server tier I was building out. Found these slides (and the accompanying lecture) to be very helpful in sifting through the field of choices: Web micro-framework BATTLE!
I chose Bottle and have been very happy with it. It's simple, lightweight (a plus if you're deploying on Raspberry Pis), easy to use, intuitive, has the features I need, and has been supremely extensible whenever I've needed to add features of my own. Many plugins are available.
Don't use Bottle's built-in HTTP server for anything but dev.
I've run Bottle in production with a lot of success; it's been very stable on Apache/mod_wsgi. nginx/uwsgi "should" work similarly but I don't have experience with it.
I also suggest you look at running bottle via gevent.pywsgi server. It's awesome, super simple to setup, asynchronous, and very fast.
Plus bottle has an adapter built for it already, so even easier.
I love bottle, and this concept that it is not meant for large projects is ridiculous. It's one of the most efficient and well written frameworks, and can be easily molded without a lot of hand wringing.

Google App Engine development server random (?) slowdowns

I'm doing a small web application which might need to eventually scale somewhat, and am curious about Google App Engine. However, I am experiencing a problem with the development server (dev_appserver.py):
At seemingly random, requests will take 20-30 seconds to complete, even if there is no hard computation or data usage. One request might be really quick, even after changing a script of static file, but the next might be very slow. It seems to occur more systematically if the box has been left for a while without activity, but not always.
CPU and disk access is low during the period. There is not allot of data in my application either.
Does anyone know what could cause such random slowdowns? I've Google'd and searched here, but need some pointers.. /: I've also tried --clear_datastore and --use_sqlite, but the latter gives an error: DatabaseError('file is encrypted or is not a database',). Looking for the file, it does not seem to exist.
I am on Windows 8, python 2.7 and the most recent version of the App Engine SDK.
Don't worry about it. It (IIRC) keeps the whole DB (datastore) in memory using a "emulation" of the real thing. There are lots of other issues that you won't see when deployed.
I'd suggest that your hard drive is spinning down and the delay you see is it taking a few seconds to wake back up.
If this becomes a problem, develop using the deployed version. It's not so different.
Does this happen in all web browsers? I had issues like this when viewing a local app engine dev site in several browsers at the same time for cross-browser testing. IE would then struggle, with requests taking about as long as you describe.
If this is the issue, I found the problems didn't occur with IETester.
Sorry if it's not related, but I thought this was worth mentioning just in case.

Is it more efficient to parse external XML or to hit the database?

I was wondering when dealing with a web service API that returns XML, whether it's better (faster) to just call the external service each time and parse the XML (using ElementTree) for display on your site or to save the records into the database (after parsing it once or however many times you need to each day) and make database calls instead for that same information.
First off -- measure. Don't just assume that one is better or worse than the other.
Second, if you really don't want to measure, I'd guess the database is a bit faster (assuming the database is relatively local compared to the web service). Network latency usually is more than parse time unless we're talking a really complex database or really complex XML.
Everyone is being very polite in answering this question: "it depends"... "you should test"... and so forth.
True, the question does not go into great detail about the application and network topographies involved, but if the question is even being asked, then it's likely a) the DB is "local" to the application (on the same subnet, or the same machine, or in memory), and b) the webservice is not. After all, the OP uses the phrases "external service" and "display on your own site." The phrase "parsing it once or however many times you need to each day" also suggests a set of data that doesn't exactly change every second.
The classic SOA myth is that the network is always available; going a step further, I'd say it's a myth that the network is always available with low latency. Unless your own internal systems are crap, sending an HTTP query across the Internet will always be slower than a query to a local DB or DB cluster. There are any number of reasons for this: number of hops to the remote server, outage or degradation issues that you can't control on the remote end, and the internal processing time for the remote web service application to analyze your request, hit its own persistence backend (aka DB), and return a result.
Fire up your app. Do some latency and response times to your DB. Now do the same to a remote web service. Unless your DB is also across the Internet, you'll notice a huge difference.
It's not at all hard for a competent technologist to scale a DB, or for you to completely remove the DB from caching using memcached and other paradigms; the latency between servers sitting near each other in the datacentre is monumentally less than between machines over the Internet (and more secure, to boot). Even if achieving this scale requires some thought, it's under your control, unlike a remote web service whose scaling and latency are totally opaque to you. I, for one, would not be too happy with the idea that the availability and responsiveness of my site are based on someone else entirely.
Finally, what happens if the remote web service is unavailable? Imagine a world where every request to your site involves a request over the Internet to some other site. What happens if that other site is unavailable? Do your users watch a spinning cursor of death for several hours? Do they enjoy an Error 500 while your site borks on this unexpected external dependency?
If you find yourself adopting an architecture whose fundamental features depend on a remote Internet call for every request, think very carefully about your application before deciding if you can live with the consequences.
Consuming the webservices is more efficient because there are a lot more things you can do to scale your webservices and webserver (via caching, etc.). By consuming the middle layer, you also have the options to change the returned data format (e.g. you can decide to use JSON rather than XML). Scaling database is much harder (involving replication, etc.) so in general, reduce hits on DB if you can.
There is not enough information to be able to say for sure in the general case. Why don't you do some tests and find out? Since it sounds like you are using python you will probably want to use the timeit module.
Some things that could effect the result:
Performance of the web service you are using
Reliability of the web service you are using
Distance between servers
Amount of data being returned
I would guess that if it is cacheable, that a cached version of the data will be faster, but that does not necessarily mean using a local RDBMS, it might mean something like memcached or an in memory cache in your application.
It depends - who is calling the web service? Is the web service called every time the user hits the page? If that's the case I'd recommend introducing a caching layer of some sort - many web service API's throttle the amount of hits you can make per hour.
Whether you choose to parse the cached XML on the fly or call the data from a database probably won't matter (unless we are talking enterprise scaling here). Personally, I'd much rather make a simple SQL call than write a DOM Parser (which is much more prone to exceptional scenarios).
It depends from case to case, you'll have to measure (or at least make an educated guess).
You'll have to consider several things.
Web service
it might hit database itself
it can be cached
it will introduce network latency and might be unreliable
or it could be in local network and faster than accessing even local disk
DB
might be slow since it needs to access disk (although databases have internal caches, but those are usually not targeted)
should be reliable
Technology itself doesn't mean much in terms of speed - in one case database parses SQL, in other XML parser parses XML, and database is usually acessed via socket as well, so you have both parsing and network in either case.
Caching data in your application if applicable is probably a good idea.
As a few people have said, it depends, and you should test it.
Often external services are slow, and caching them locally (in a database in memory, e.g., with memcached) is faster. But perhaps not.
Fortunately, it's cheap and easy to test.
Test definitely. As a rule of thumb, XML is good for communicating between apps, but once you have the data inside of your app, everything should go into a database table. This may not apply in all cases, but 95% of the time it has for me. Anytime I ever tried to store data any other way (ex. XML in a content management system) I ended up wishing I would have just used good old sprocs and sql server.
It sounds like you essentially want to cache results, and are wondering if it's worth it. But if so, I would NOT use a database (I assume you are thinking of a relational DB): RDBMSs are not good for caching; even though many use them. You don't need persistence nor ACID.
If choice was between Oracle/MySQL and external web service, I would start with just using service.
Instead, consider real caching systems; local or not (memcache, simple in-memory caches etc).
Or if you must use a DB, use key/value store, BDB works well. Store response message in its serialized form (XML), try to fetch from cache, if not, from service, parse. Or if there's a convenient and more compact serialization, store and fetch that.

Categories