I have bunch of python-projects with untrusted WSGI-apps inside them. I need to run them simulatiously and safely. So I need restrictions for directory access, python module usage and limitations for CPU and Memory.
I consider two approaches:
Import via imp-module WSGI-object from defined file, and running it with pysandbox. Now I have SandboxError: Read only object when doing:
self.config = SandboxConfig('stdout')
self.sandbox = Sandbox(self.config)
self.s = imp.get_suffixes()
wsgi_obj = imp.load_module("run", open(path+"/run.py", "r"), path, self.s[2]).app
…
return self.sandbox.call(wsgi_obj, environ, start_response)
Modify Python interpreter, exclude potentially risky modules, run in parallel processes, communicate via ZMQ/Unix sockets. I even don't know where to start here.
What could you recommend?
I would run your applications with gunicorn, with a separate process and configuration for each app, and with user-level permissions (each untrusted app on a different user). Each gunicorn instance would serve on localhost on a user-range port, and nginx or another webserver could connect into them to route and serve them to the web.
Heroku takes this a step further and sandboxes each gunicorn instance (or unicorn or apache or arbitrary other server) in a virtual machine. This is probably the most secure possible way to do things, and definitely the best option for reliably limiting CPU and memory usage, but you may not need to go that far depending on your requirements.
One of the advantages of this kind of approach is that each application can run on a different version of Python if appropriate; with the virtual machine sandbox they can even run on different operating systems entirely.
Edit: To limit memory usage without using a VM sandbox approach, see this question. To limit CPU usage, tweak the gunicorn settings -- spin up one gevent-style worker per core an application is allowed to use.
Edit again: One completely different approach would be to use PyPy's sandboxing mechanism which should be much more secure than CPython plus a sandboxing module. However, I would prefer the guincorn or gunicorn + virtual machine approach.
Related
Problem: .so(shared object) as library in python works well when python calls it and fails in uWSGI-running python(Django) application.
More info: I've build Go module with go build -buildmode=c-shared -o output.so input.go to call it in Python with
from ctypes import cdll
lib = cdll.LoadLibrary('path_to_library/output.so')
When django project is served via uWSGI the request handler that calling Go library freezes, causing future 504 in Nginx. After getting in "so called freeze", uWSGI is locked there and only restarting helps to enliven app. No logs AT ALL! It just freezes.
Everything works correctly when i run in python interpreter on the same machine.
My thoughts: i've tried to debug this and put a lot of log messages in library, but it won't give much info because everything is fine with library(because it works in interpreter). Library loads correctly, because some log messages that i've putted in library. I think it some sort of uWSGI limitation. I don't think putting uwsgi.ini file is somehow helpful.
Additional info:
Go dependencies:
fmt
github.com/360EntSecGroup-Skylar/excelize
log
encoding/json
OS: CentOS 6.9
Python: Python 3.6.2
uWSGI: 2.0.15
What limitations can be in uWSGI in that type of shared object work and if there a way to overcome them?
Firstly, are you absolutely positive you need to call Go as a library from uWSGI process?
uWSGI are usually for interpreted languages such as PHP, Python, Ruby and others. It bootstraps the interpreter and manages the master/worker processes to handle requests. It seems strange to be using it on Go library.
You mentioned having nginx as your webserver, why not just use your Go program as the http server (which it does great) and call it from nginx directly using it's URL:
location /name/ {
proxy_pass http://127.0.0.1/go_url/;
}
See nginx docs.
If you really want to use Go as a python imported library via a .so module, you have to be aware Go has its own runtime, thread management and may not work well with uWSGI which handles threads/processes in a different way. In this case I'm unable to help you, since I never actually tried this.
If you could clarify your question with what are you actually tring to do, we might me able to answer more helpfully.
Attempts and thoughts
I tried to avoid separation of shared library from my python code, since it requires support of at least one more process, and i would have to rewrite some of the library to create new api.
As #Lu.nemec kindly noted that:
Go has its own runtime, thread management and may not work well with uWSGI which handles threads/processes in a different way
Since uWSGI is the problem i started seaching for a solution there. One of the hopes was installing GCCGO uWSGI plugin somehow solve that problem. But even it's hard to install on old OSes, because it lacks of pre-builded plugins and manual build haven't gone very well, it haven't helped, nothing changes, it still freezes.
And then i thought that i wan't to disable coroutines and that type of stuff that differs from uWSGI and one of the changes that i am able to do is to set GOMAXPROCS
GOMAXPROCS sets the maximum number of CPUs that can be executing simultaneously and returns the previous setting. If n < 1, it does not change the current setting. The number of logical CPUs on the local machine can be queried with NumCPU. This call will go away when the scheduler improves.
And it worked like a charm!!!
The solution
import (
...
"runtime"
)
...
//export yourFunc
func yourFunc(yourArgs argType) {
runtime.GOMAXPROCS(1)
...
}
My previous answer works in some cases. HOWEVER, when i tried to run the same project on another server with same OS, same Python, same uWSGI (version, plugins, config files), same requirements, same .so file, it freezed the save way as i described in answer.
I personally didn't want to run this as separate process and bind it to socket/port and create API for communicating with shared library.
The solution:
Required only separate process. Run with celery.
Some caveats:
1.You cannot run task with task.apply() since it would be run in main application, not in celery:
result = task.apply_async()
while result.ready():
time.wait(5)
2.You neeed to run celery with solo execution pool
celery -A app worker -P solo
I have a Python web application (using WSGI) deployed on Openshift. The application is quite memory greedy. What I have noticed is that there are several instance of Apache httpd service deployed at all times. That means the memory usage of my gear is multiplied by the number of these processes and the application crashes pretty often.
I don't have lots of traffic yet, so there is no need to have multiple httpd running.
Is there any way to configure Python cartridge to limit it to a single httpd process?
If you are using the OpenShift Python cartridge and its default setup, only two of those processes should actually have copies of your application running in it. The other httpd processes are the parent monitor process and the Apache child worker processes which will proxy requests to the processes which are actually running your web application.
If you need control to reduce it down to one process, then you would need to follow:
http://blog.dscpl.com.au/2015/01/using-alternative-wsgi-servers-with.html
to override the standard setup and use mod_wsgi-express instead. This will default to using one process for your application and allow you to control both number of processes and threads for the application processes.
If you are seeing lots of memory use, then it could just be your application code, or there is an outside chance you are seeing memory issues due to use of older mod_wsgi as there are some odd corner cases which can cause extra memory usage because of how Apache works. If you use mod_wsgi-express it will use the latest and avoid those problems.
So try mod_wsgi-express and if still have memory issues, suggest you get on the mod_wsgi mailing list to get help debugging it.
I think in the past python scripts would run off CGI, which would create a new thread for each process.
I am a newbie so I'm not really sure, what options do we have?
Is the web server pipeline that python works under any more/less effecient than say php?
You can still use CGI if you want, but the normal approach these days is using WSGI on the Python side, e.g. through mod_wsgi on Apache or via bridges to FastCGI on other web servers. At least with mod_wsgi, I know of no inefficiencies with this approach.
BTW, your description of CGI ("create a new thread for each process") is inaccurate: what it does is create a new process for each query's service (and that process typically needs to open a database connection, import all needed modules, etc etc, which is what may make it slow even on platforms where forking a process, per se, is pretty fast, such as all Unix variants).
I would suggest Django http://www.djangoproject.com. It is very convenient to use, has everything you need for making web services. The most efficient way to use it is to run it as via Apache's mod_wsgi, and make Apache itself serve the static files.
This generally has better performance than solutions such as CGI and mod-python, as the Python process running the web service runs separate from the main web server, so it can cache stuff and easily re-use resources (like DB handles).
Also, you can then tweak the number of worker threads for Apache and your web application separately, resulting in better scalability.
I suggest cherrypy (http://www.cherrypy.org/). It is very convenient to use, has everything you need for making web services, but still quite simple (no mega-framework). The most efficient way to use it is to run it as self-contained server on localhost and put it behind Apache via a Proxy statement, and make apache itself serve the static files.
This generally has better performance than solutions such as CGI and mod-python, as the Python process running the web service runs separate from the main web server, so it can cache stuff and easily re-use resources (like DB handles).
Also, you can then tweak the number of worker threads for Apache and your web application separately, resulting in better scalability.
I am deploying a WSGI application. There are many ways to skin this cat. I am currently using apache2 with mod-wsgi, but I can see some potential problems with this.
So how can it be done?
Apache Mod-wsgi (the other mod-wsgi's seem to not be worth it)
Pure Python web server eg paste, cherrypy, Spawning, Twisted.web
as 2 but with reverse proxy from nginx, apache2 etc, with good static file handling
Conversion to other protocol such as FCGI with a bridge (eg Flup) and running in a conventional web server.
More?
I want to know how you do it, and why it is the best way to do it. I would absolutely love you to bore me with details about the whats and the whys, application specific stuff, etc.
As always: It depends ;-)
When I don't need any apache features I am going with a pure python webserver like paste etc. Which one exactly depends on your application I guess and can be decided by doing some benchmarks. I always wanted to do some but never came to it. I guess Spawning might have some advantages in using non blocking IO out of the box but I had sometimes problems with it because of the patching it's doing.
You are always free to put a varnish in front as well of course.
If an Apache is required I am usually going with solution 3 so that I can keep processes separate. You can also more easily move processes to other servers etc. I simply like to keep things separate.
For static files I am using right now a separate server for a project which just serves static images/css/js. I am using lighttpd as webserver which has great performance (in this case I don't have a varnish in front anymore).
Another useful tool is supervisord for controlling and monitoring these services.
I am additionally using buildout for managing my deployments and development sandboxes (together with virtualenv).
I would absolutely love you to bore me with details about the whats and the whys, application specific stuff, etc
Ho. Well you asked for it!
Like Daniel I personally use Apache with mod_wsgi. It is still new enough that deploying it in some environments can be a struggle, but if you're compiling everything yourself anyway it's pretty easy. I've found it very reliable, even the early versions. Props to Graham Dumpleton for keeping control of it pretty much by himself.
However for me it's essential that WSGI applications work across all possible servers. There is a bit of a hole at the moment in this area: you have the WSGI standard telling you what a WSGI callable (application) does, but there's no standardisation of deployment; no single way to tell the web server where to find the application. There's also no standardised way to make the server reload the application when you've updated it.
The approach I've adopted is to put:
all application logic in modules/packages, preferably in classes
all website-specific customisations to be done by subclassing the main Application and overriding members
all server-specific deployment settings (eg. database connection factory, mail relay settings) as class __init__() parameters
one top-level ‘application.py’ script that initialises the Application class with the correct deployment settings for the current server, then runs the application in such a way that it can work deployed as a CGI script, a mod_wsgi WSGIScriptAlias (or Passenger, which apparently works the same way), or can be interacted with from the command line
a helper module that takes care of above deployment issues and allows the application to be reloaded when the modules the application is relying on change
So what the application.py looks like in the end is something like:
#!/usr/bin/env python
import os.path
basedir= os.path.dirname(__file__)
import MySQLdb
def dbfactory():
return MySQLdb.connect(db= 'myappdb', unix_socket= '/var/mysql/socket', user= 'u', passwd= 'p')
def appfactory():
import myapplication
return myapplication.Application(basedir, dbfactory, debug= False)
import wsgiwrap
ismain= __name__=='__main__'
libdir= os.path.join(basedir, 'system', 'lib')
application= wsgiwrap.Wrapper(appfactory, libdir, 10, ismain)
The wsgiwrap.Wrapper checks every 10 seconds to see if any of the application modules in libdir have been updated, and if so does some nasty sys.modules magic to unload them all reliably. Then appfactory() will be called again to get a new instance of the updated application.
(You can also use command line tools such as
./application.py setup
./application.py daemon
to run any setup and background-tasks hooks provided by the application callable — a bit like how distutils works. It also responds to start/stop/restart like an init script.)
Another trick I use is to put the deployment settings for multiple servers (development/testing/production) in the same application.py script, and sniff ‘socket.gethostname()’ to decide which server-specific bunch of settings to use.
At some point I might package wsgiwrap up and release it properly (possibly under a different name). In the meantime if you're interested, you can see a dogfood-development version at http://www.doxdesk.com/file/software/py/v/wsgiwrap-0.5.py.
The absolute easiest thing to deploy is CherryPy. Your web application can also become a standalone webserver. CherryPy is also a fairly fast server considering that it's written in pure Python. With that said, it's not Apache. Thus, I find that CherryPy is a good choice for lower volume webapps.
Other than that, I don't think there's any right or wrong answer to this question. Lots of high-volume websites have been built on the technologies you talk about, and I don't think you can go too wrong any of those ways (although I will say that I agree with mod-wsgi not being up to snuff on every non-apache server).
Also, I've been using isapi_wsgi to deploy python apps under IIS. It's a less than ideal setup, but it works and you don't always get to choose otherwise when you live in a windows-centric world.
Nginx reverse proxy and static file sharing + XSendfile + uploadprogress_module. Nothing beats it for the purpose.
On the WSGI side either Apache + mod_wsgi or cherrypy server. I like to use cherrypy wsgi server for applications on servers with less memory and less requests.
Reasoning:
I've done benchmarks with different tools for different popular solutions.
I have more experience with lower level TCP/IP than web development, especially http implementations. I'm more confident that I can recognize a good http server than I can recognize a good web framework.
I know Twisted much more than Django or Pylons. The http stack in Twisted is still not up to this but it will be there.
I'm using Google App Engine for an application I'm developing. It runs WSGI applications.
Here's a couple bits of info on it.
This is the first web-app I've ever really worked on, so I don't have a basis for comparison, but if you're a Google fan, you might want to look into it. I've had a lot of fun using it as my framework for learning.
TurboGears (2.0)
TurboGears 2.0 is leaving Beta within the next month (has been in it for plenty of time). 2.0 improves upon 1.0 series and attempts to give you best-of-breed WSGI stack, so it makes some default choices for you, if you want the least fuss.
it has the tg* tools for testing and deployment in 1.x series, but now transformed to paster equivalents in 2.0 series, which shoud seem familiar if you've expermiented with pylons.
tg-admin quickstart —> paster quickstart
tg-admin info —> paster tginfo
tg-admin toolbox –> paster toolbox
tg-admin shell –> paster shell
tg-admin sql create –> paster setup-app development.ini
Pylons
It you'd like to be more flexible in your WSGI stack (choice of ORM, choice of templater, choice of form-ing), Pylons is becoming the consolidated choice. This would be my recommended choice, since it offers excellent documentation and allows you to experiment with different components.
It is a pleasure to work with as a result, and works on under Apache (production deployment) or stand-alone (helpful for testing and experimenting stage).
so it follows, you can do both with Pylons:
2 option for testing stage (python standalone)
4 for scalable production purposes (FastCGI, assuming the database you choose can keep up)
The Pylons admin interface is very similar to TurboGears. Here's a toy standalone example:
$ paster create -t pylons helloworld
$ cd helloworld
$ paster serve --reload development.ini
for production-class deployment, you could refer to the setup guide of Apache + FastCGI + mod_rewrite is available here. this would scale up to most needs.
Apache httpd + mod_fcgid using web.py (which is a wsgi application).
Works like a charm.
We are using pure Paste for some of our web services. It is easy to deploy (with our internal deployment mechanism; we're not using Paste Deploy or anything like that) and it is nice to minimize the difference between production systems and what's running on developers' workstations. Caveat: we don't expect low latency out of Paste itself because of the heavyweight nature of our requests. In some crude benchmarking we did we weren't getting fantastic results; it just ended up being moot due to the expense of our typical request handler. So far it has worked fine.
Static data has been handled by completely separate (and somewhat "organically" grown) stacks, including the use of S3, Akamai, Apache and IIS, in various ways.
Apache+mod_wsgi,
Simple, clean. (only four lines of webserver config), easy for other sysadimns to get their head around.
Good morning.
As the title indicates, I've got some questions about using python for web development.
What is the best setup for a development environment, more specifically, what webserver to use, how to bind python with it. Preferably, I'd like it to be implementable in both, *nix and win environment.
My major concern when I last tried apache + mod_python + CherryPy was having to reload webserver to see the changes. Is it considered normal? For some reason cherrypy's autoreload didn't work at all.
What is the best setup to deploy a working Python app to production and why? I'm now using lighttpd for my PHP web apps, but how would it do for python compared to nginx for example?
Is it worth diving straight with a framework or to roll something simple of my own? I see that Django has got quite a lot of fans, but I'm thinking it would be overkill for my needs, so I've started looking into CherryPy.
How exactly are Python apps served if I have to reload httpd to see the changes? Something like a permanent process spawning child processes, with all the major file includes happening on server start and then just lazy loading needed resources?
Python supports multithreading, do I need to look into using that for a benefit when developing web apps? What would be that benefit and in what situations?
Big thanks!
What is the best setup for a development environment?
Doesn't much matter. We use Django, which runs in Windows and Unix nicely. For production, we use Apache in Red Hat.
Is having to reload webserver to see the changes considered normal?
Yes. Not clear why you'd want anything different. Web application software shouldn't be dynamic. Content yes. Software no.
In Django, we develop without using a web server of any kind on our desktop. The Django "runserver" command reloads the application under most circumstances. For development, this works great. The times when it won't reload are when we've damaged things so badly that the app doesn't properly.
What is the best setup to deploy a working Python app to production and why?
"Best" is undefined in this context. Therefore, please provide some qualification for "nest" (e.g., "fastest", "cheapest", "bluest")
Is it worth diving straight with a framework or to roll something simple of my own?
Don't waste time rolling your own. We use Django because of the built-in admin page that we don't have to write or maintain. Saves mountains of work.
How exactly are Python apps served if I have to reload httpd to see the changes?
Two methods:
Daemon - mod_wsgi or mod_fastcgi have a Python daemon process to which they connect. Change your software. Restart the daemon.
Embedded - mod_wsgi or mod_python have an embedded mode in which the Python interpreter is inside the mod, inside Apache. You have to restart httpd to restart that embedded interpreter.
Do I need to look into using multi-threaded?
Yes and no. Yes you do need to be aware of this. No, you don't need to do very much. Apache and mod_wsgi and Django should handle this for you.
So here are my thoughts about it:
I am using Python Paste for developing my app and eventually also running it (or any other python web server). I am usually not using mod_python or mod_wsgi as it makes development setup more complex.
I am using zc.buildout for managing my development environment and all dependencies together with virtualenv. This gives me an isolated sandbox which does not interfere with any Python modules installed system wide.
For deployment I am also using buildout/virtualenv, eventually with a different buildout.cfg. I am also using Paste Deploy and it's configuration mechanism where I have different config files for development and deployment.
As I am usually running paste/cherrypy etc. standalone I am using Apache, NGINX or maybe just a Varnish alone in front of it. It depends on what configuration options you need. E.g. if no virtual hosting, rewrite rules etc. are needed, then I don't need a full featured web server in front. When using a web server I usually use ProxyPass or some more complex rewriting using mod_rewrite.
The Python web framework I use at the moment is repoze.bfg right now btw.
As for your questions about reloading I know about these problems when running it with e.g. mod_python but when using a standalone "paster serve ... -reload" etc. it so far works really well. repoze.bfg additionally has some setting for automatically reloading templates when they change. If the framework you use has that should be documented.
As for multithreading that's usually used then inside the python web server. As CherryPy supports this I guess you don't have to worry about that, it should be used automatically. You should just eventually make some benchmarks to find out under what number of threads your application performs the best.
Hope that helps.
+1 to MrTopf's answer, but I'll add some additional opinions.
Webserver
Apache is the webserver that will give you the most configurability. Avoid mod_python because it is basically unsupported. On the other hand, mod_wsgi is very well supported and gives you better stability (in other words, easier to configure for cpu/memory usage to be stable as opposed to spikey and unpredictable).
Another huge benefit, you can configure mod_wsgi to reload your application if the wsgi application script is touched, no need to restart Apache. For development/testing servers you can even configure mod_wsgi to reload when any file in your application is changed. This is so helpful I even run Apache+mod_wsgi on my laptop during development.
Nginx and lighttpd are commonly used for webservers, either by serving Python apps directly through a fastCGI interface (don't bother with any WSGI interfaces on these servers yet) or by using them as a front end in front of Apache. Calls into the app get passed through (by proxy) to Apache+mod_wsgi and then nginx/lighttpd serve the static content directly.
Nginx has the added advantage of being able to serve content directly from memcached if you want to get that sophisticated. I've heard disparaging comments about lighttpd and it does seem to have some development problems, but there are certainly some big companies using it successfully.
Python stack
At the lowest level you can program to WSGI directly for the best performance. There are lots of helpful WSGI modules out there to help you in areas you don't want to develop yourself. At this level you'll probably want to pick third-party WSGI components to do things like URL resolving and HTTP request/response handling. A great request/response component is WebOb.
If you look at Pylons you can see their idea of "best-of-breed" WSGI components and a framework that makes it easier than Django to choose your own components like templating engine.
Django might be overkill but I don't think that's a really good argument against. Django makes the easy stuff easier. When you start to get into very complicated applications is where you really need to look at moving to lower level frameworks.
Look at Google App Engine. From their website:
Google App Engine lets you run your
web applications on Google's
infrastructure. App Engine
applications are easy to build, easy
to maintain, and easy to scale as your
traffic and data storage needs grow.
With App Engine, there are no servers
to maintain: You just upload your
application, and it's ready to serve
your users.
You can serve your app using a free
domain name on the appspot.com domain,
or use Google Apps to serve it from
your own domain. You can share your
application with the world, or limit
access to members of your
organization.
App Engine costs nothing to get
started. Sign up for a free account,
and you can develop and publish your
application for the world to see, at
no charge and with no obligation. A
free account can use up to 500MB of
persistent storage and enough CPU and
bandwidth for about 5 million page
views a month.
Best part of all: It includes Python support, including Django. Go to http://code.google.com/appengine/docs/whatisgoogleappengine.html
When you use mod_python on a threaded Apache server (the default on Windows), CherryPy runs in the same process as Apache. In that case, you almost certainly don't want CP to restart the process.
Solution: use mod_rewrite or mod_proxy so that CherryPy runs in its own process. Then you can autoreload to your heart's content. :)