Distributed system: Python 3 worker and Node.js server?

Distributed system: Python 3 worker and Node.js server? - python

I'm looking to set up a distributed system where there are compute/worker machines running resource-heavy Python 3 code, and there is a single web server that serves the results of the Python computation to clients. I would very much like to write the web server in Node.js.
I've looked into using an RPC framework—specifically, this question lead me to ZeroRPC, but it's not compatible with Python 3 (the main issue is that it requires gevent, which isn't that close to a Python 3 version yet). There doesn't seem to be another viable option for Python–Node.js RPC as far as I can tell.
In light of that, I'm open to using something other than RPC, especially since I've read that the RPC strategy hides too much from the programmer.
I'm also open to using a different language for the web server if that really makes more sense; for example, it may be much simpler from a development point of view to just use Python for the server too.
How can I achieve this?

You have a few options here.
First, it sounds like you like ZeroRPC, and your only problem is that it depends on gevent, which is not 3.x-ready yet.
Well, gevent is close to 3.x-ready. There are a few forks of it that people are testing and even using, which you can see on issue #38. As of mid-September 2014 the one that seems to be getting the most traction is Michal Mazurek's. If you're lucky, you can just do this:
pip3 install git+https://github.com/MichalMazurek/gevent
pip3 install ZeroRPC
Or, if ZeroRPC has metadata that says it's Python 2-only, you can install it from its repo the same way as gevent.
The down side is that none of the gevent-3.x forks are quite battle-tested yet, which is why none of them have been accepted upstream and released yet. But if you're not in a huge hurry, and willing to take a risk, there's a pretty good chance you can start with a fork today, and switch to the final version when it's released, hopefully before you've reached 1.0 yourself.
Second, ZeroRPC is certainly not the only RPC library available for either Python or Node. And most of them have a similar kind of interface for exposing methods over RPC. And, while you may ultimately need something like ZeroMQ for scalability or deployment reasons, you can probably use something simpler and more widespread like JSON-RPC over HTTP—which has a dozen or more Python and Node implementations—for early development, then switch to ZeroRPC later.
Third, RPC isn't exactly complicated, and binding methods to RPCs the way most libraries do isn't that hard. Making it asynchronous can be tricky, but again, for early development you can just use an easy but nonscalable solution—creating a thread for each request—and switch to something else later. (Of course that solution is only easy if your service is stateless; otherwise you're just eliminating all of your async problems and replacing them with race condition problems…)

ZeroMQ offers several transport classes, while the first two will be best suited for the case of a heterogenous RPC layer
ipc://
tcp://
pgm://
epgm://
inproc://
ZeroMQ has ports for both systems, so will definitely serve also your projected needs.

Related

Fabric vs Plumbum: differences, use cases, pros and cons

What are pros and cons of Fabric and Plumbum python libraries for local/remote command execution? What are use cases when one library should be used and other is not? What are the differences attention should be drawn to?

background and suggested comparison methodology
(oops it's a dead post)
Both tools are fun, allow either local or remote work, but have differences in the things they are supposed to solve, i.e. "terminology", and both are basically pretty much obsolete by modern deployment/automation tooling (like ansible, and many others that chose DSL way, e.g. terraform).
Their advantage over more modern ones are lack of "opinionated" approach about the "how", and more on "what".
Suggested comparison criteria:
"Pythonness" vs. "Shellness" (i.e. how "pythonic" the user code with each is)
Special Capabilities
ROI with 2 types of maintainers of your "automation" code (ops vs. devs, let's put "QA" as something in between)
Fabric (my last work was done at 1.8 take this with a grain of time salt):
more pythonic, than shellish, this means easy to support by both old tools and new - i.e. editors, IDEs would be easy to setup
many many context processors, many decorators, very nice
easier to adopt by developers, a bit more traction would come from ops people
Plumbum
The user code can be either pythonic or shellish
"shell combinators" are a killer feature to get senior shell/perl folk onboard, but it uses dynamic imports, so editors/IDEs are a bit trickier to setup.
Due to 1. You will get 'ops' people on board easier, because of mimicking shell constructs in Plumbum, but please install good coding conventions.
Epilogue
Having worked with both toolkits (with lots of fun) and then having switched to ansible - I feel confident to claim - both tools are now superseded by ansible.
you can do most automation tasks with existing ansible modules, and what you can't - you can write a plugin or module for it (in any language), or just call shell module.
My consideration would be this:
if your team of maintainers has good level of programming skills (Esp. in python), as a requirement - you'd be ok with using either fabric, Plumbum (it has more cool hacks ;)) or Ansible.
if you have multi-level multi-team organization, I would simply bet on Ansible - it has lower learning curve, and allows to grow up easily.
Good day.

They're pretty much the same thing. The biggest win for fabric over plumbum is the ability to connect to multiple hosts in parallel, which is more or less indispensible when you're working with a non-trivial setup. fabric also offers a couple of contrib helpers that let you upload jinja templates, upload files, and transfer files back to the local system. I personally find the fabric api to be far more intuitive for working with remote servers.
YMMV, of course, but both are geared towards being very close to shell commands. That said, my team and I are focused on ansible for most configuration / deploy flows. Fabric does offer some power over ansible at the expense of having to roll your own idempotence.

Is pywebsocket good for a production environment?

Is it a good idea to use pywebsocket in a production environment, since their google developer page states ...
pywebsocket is intended for testing or experimental purposes.
Moreover what would the specific drawbacks of using it be?
Are there performance drawbacks?
Is it not stable or unsecure in a certain way?
...
Since Mozilla as well as Google use it to test their websocket implementations, and it was suggested (for production) in many SO threads, I thought it to be a pretty stable basis until I read the docs.
Or am I misinterpeting something, and it is just meant to be especially helpful for testing, as well as suitable in production?

After some research it becomes clear, that pywebsockets was developed for testing browser implementations. It has (at least) the drawbacks of being neither secure nor scalable.
So in short: It is not suitable to be used in production environment!
As stated above it is still a great tool for testing your client side implementation of websockets, probably even "the best" one for that, since the websocket implementations of Chrome as well as Firefox,... are being tested with it.
As alternatives for production in python you could look at:
Twisted
Tornado
Gist
websockify
Autobahn
For a list of tools for other programming languages have a look at this SO wiki answer

How to embed a Python interpreter on a website

I am attempting to build an educational coding site, similar to Codecademy, but I am frankly at a loss as to what steps should be taken. Could I be pointed in the right direction in including even a simple python interpreter in a webapp?

One option might be to use PyPy to create a sandboxed python. It would limit the external operations someone could do.
Once you have that set up, your website would take the code source, send it over ajax to your webserver, and the server would run the code in a subprocess of a sandboxed python instance. You would also be able to kill the process if it took longer than say 5 seconds. Then you return the output back as a response to the client.
See these links for help on a PyPy sandbox:
http://doc.pypy.org/en/latest/sandbox.html
http://readevalprint.com/blog/python-sandbox-with-pypy.html
To create a fully interactive REPL would be even more involved. You would need to keep an interpreter alive to each client on your server. Then accept ajax "lines" of input and run them through the interp by communicating with the running process, and return the output.
Overall, not trivial. You would need some strong dev skills to do this comfortably. You may find this task a bit daunting if you are just learning.

There's more to do here than you think.
The major problem is that you cannot let people run arbitrary Python code on your webserver. For example, what happens if they do
import os
os.system("rm -rf *.*")
So clearly you have to run this Python code securely. But then you have the problem of securing Python, which is basically impossible because of how dynamic it is. And so you'll probably have to run the Python shell in a virtual machine, which comes with its own headaches.
Have you seen e.g. http://code.google.com/p/google-app-engine-samples/downloads/detail?name=shell_20091112.tar.gz&can=2&q=?

One recent option for this is to use repl.
This option is awesome because the compilers are made using JavaScript so the compilation and execution is made in the user-side, meaning that the server is free of vulnerabilities.
They have compilers for: Python3, Python, Javascript, Java, Ruby, PHP...
I strongly recommend you to check their site at http://repl.it

Look into LXC Containers. They have a pretty cool api that you can use to create lightweight linux containers. You could run the subprocess commands inside that container that way the end user could not mess with your main server.

What are the reasons for not hosting a compiler on a live server?

Where I currently work we've had a small debate about deploying our Python code to the production servers. I voted to build binary dependencies (like the python mysql drivers) on the server itself, just using pip install -r requirements.txt. This was quickly vetoed with no better explanation that "we don't put compilers on the live servers". As a result our deployment process is becoming convoluted and over-engineered simply to avoid this compilation step.
My question is this: What's the reason these days to avoid having a compiler on live servers?

In general, the prevailing wisdom on servers installs is that they should be as stripped-down as possible. There are a few motivations for this, but they don't really apply all that directly to your question about a compiler:
Minimize resource usage. GCC might take up a little extra disk space, but probably not enough to matter - and it won't be running most of the time, so CPU/memory usage isn't a big concern.
Minimize complexity. Building on your server might add a few more failure modes to your build process (if you build elsewhere, then at least you will notice something wrong before you go mess with your production server), but otherwise, it won't get in the way.
Minimize attack surface. As others have pointed out, by the time an attacker can make use of a compiler, you're probably already screwed..
At my company, we generally don't care too much if compilers are installed on our servers, but we also never run pip on our servers, for a rather different reason. We're not so concerned about where packages are built, but when and how they are downloaded.
The particularly paranoid among us will take note that pip (and easy_install) will happily install packages from PYPI without any form of authentication (no SSL, no package signatures, ...). Further, many of these aren't actually hosted on PYPI; pip and easy_install follow redirects. So, there are two problems here:
If pypi - or any of the other sites on which your dependencies are hosted - goes down, then your build process will fail
If an attacker somehow manages to perform a man-in-the-middle attack against your server as it's attempting to download a dependency package, then he'll be able to insert malicious code into the download
So, we download packages when we first add a dependency, do our best to make sure the source is genuine (this is not foolproof), and add them into our own version-control system. We do actually build our packages on a separate build server, but this is less crucial; we simply find it useful to have a binary package we can quickly deploy to multiple instances.

I would suggest to refer to this serverfault post.
It makes sense to avoid exploits being compiled remotely
It makes sense also to me that in terms of security, it will only make the task harder for a hijacker without than with a compiler, but it's not perfect.

It would put a heavy strain on the server?

Suggest a standalone python web framework?

I have a python program that I would like to present as a simple web application. The program currently uses sqlite for storage. I also need to distribute the whole thing to colleagues so having something standalone and easy to start would be ideal ( no install if possible). This web app is meant to be used locally , not by multiple users over a network.
Is there a suitable python framework that might fit my needs? I looked at Django so far but it seems a bit heavy handed for what I need.
Thanks for any suggestions.

I have never tried it myself, but you could try Bottle:
Bottle is a fast, simple and lightweight WSGI micro web-framework for
Python. It is distributed as a single file module and has no
dependencies other than the Python Standard Library.

try http://docs.python.org/library/simplehttpserver.html

As web frameworks are not part of the standard lib, you will have to install something in every case. I would propse to look at http://flask.pocoo.org/. It has a build in WSGI server.

Lots of choices for Python web frameworks! Another is web2py which is designed to work out of the box and allows, but doesn't require, through-the-web development. It is mature and has a strong community and is well-documented.

Tornado as a framework may be a lot more than what you're looking for. However it will meet the requirement of being a completely python based web server. http://tornadoweb.org
I generally just download the source, drop it in /tornado/ of my project and do includes there from the app.

I don't think that any web framework is specifically oriented for the use case you're talking about; They all assume they are running on a server and there's a browser on a remote machine that is accessing them.
A better approach is to think about the HTTP server you'll be using. It's probably preferable to use a server that's as easy to pack and ship as the rest of the python code you'll be using. Now most frameworks provide a 'development' server that's easy to invoke from the command line, but most of them are intended to be "easy for the developer" which often means they are restricted to a single thread. This is bad for deployment because single threaded servers will always feel a bit sluggish.
CherryPy stands out in contrast, by providing a full featured, embedded server that's easy to configure for many use cases, and is available by default with the rest of the framework. There are probably others, but I haven't used 'em.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.