new StackOverflow user here. I need help with an Apache freezing problem. I have a WAMPServer setup on Win 7 64-bit and am working with python / django / mysql / mod_wsgi / matplotlib, experimenting with dynamically rendered images. I am using Apache to serve static files.
I am trying to plot data from a MySQL database. My views.py file is below. When I invoke the function "view_Stats" by visiting the appropriate web page, this calls the "CreateFig" function to create and save .png files to a directory that are subsequently served by Apache. It works fine initially, but it seems as if a maximum of 8 calls can be made to the "CreateFig" function before Apache just hangs. I have to restart Apache at that point, but it takes a while (minutes) for it to restart.
Looking at the Apache error logs (see below) shows an error related to Apache child processes that requires Apache to force it to terminate. I suspect some sort of memory leak / error, but I'm pretty new at this and can't troubleshoot well; I've Googled this and looked around on StackOverflow, no joy.
Any help would be appreciated!
[Tue Mar 11 17:01:07.550093 2014] [core:notice] [pid 2820:tid 404] AH00094: Command line: 'c:\\wamp\\bin\\apache\\apache2.4.4\\bin\\httpd.exe -d C:/wamp/bin/apache/Apache2.4.4'
[Tue Mar 11 17:01:07.551093 2014] [mpm_winnt:notice] [pid 2820:tid 404] AH00418: Parent: Created child process 3528
[Tue Mar 11 17:01:07.856093 2014] [mpm_winnt:notice] [pid 3528:tid 324] AH00354: Child: Starting 150 worker threads.
[Tue Mar 11 17:04:53.233893 2014] [mpm_winnt:notice] [pid 2820:tid 404] AH00422: Parent: Received shutdown signal -- Shutting down the server.
[Tue Mar 11 17:05:23.248293 2014] [mpm_winnt:notice] [pid 2820:tid 404] AH00431: Parent: Forcing termination of child process 3528
The Code from views.py is below:
from django.contrib import auth
from django.contrib.auth.models import User, Group
from django.core.context_processors import csrf
from django.shortcuts import render_to_response
from django.http import Http404, HttpResponseRedirect
from rwjcnlab import settings
from clientele.models import UserProfile
from reports.models import EEG, LTM, EMU, AEEG
import os, datetime
import numpy
from pylab import *
import matplotlib.pyplot as plt; plt.rcdefaults()
import matplotlib.pyplot as plt
import gc
# CREATE VIEWS HERE
def view_Stats(request):
UID = UserProfile.objects.get(user_id = request.user.id)
StatsEEG, StatsLTM, StatsAEEG, StatsEMU, start_date = ReportNumbers(UID.id)
# Create figures
CreateFig(StatsEEG, 300, 50, 'EEG', 'b')
CreateFig(StatsLTM, 100, 10, 'LTM', 'r')
CreateFig(StatsAEEG, 15, 3, 'AEEG', 'y')
CreateFig(StatsEMU, 25, 5, 'EMU', 'c')
return render_to_response('view_Stats.html', {
'StatsEEG': StatsEEG,
'StatsLTM': StatsLTM,
'StatsAEEG': StatsAEEG,
'StatsEMU': StatsEMU,
'start_date': start_date,
'user': request.user,
})
def CreateFig(Stats, ymax, yinc, figname, c):
nAll = tuple(x[1] for x in Stats)
nUser = tuple(x[2] for x in Stats)
xlabels = tuple(x[0].strftime("%b%y") for x in Stats)
ind = numpy.arange(len(xlabels)-1.4,-0.4,-1) # the x locations for the groups
width = 0.8 # the width of the bars: can also be len(x) sequence
plt.ioff()
fig = plt.figure(figsize=(10, 5), dpi=72, facecolor='w', edgecolor='k')
p1 = plt.bar(ind, nAll[1:], width, color=c)
p2 = plt.bar(ind, nUser[1:], width, color='g')
plt.title(figname+' Volumes at RWJUH')
plt.xticks(ind+width/2., xlabels[1:])
plt.yticks(numpy.arange(0,ymax,yinc))
plt.legend( (p1[0], p2[0]), ('Total', 'User') )
plt.savefig(os.path.join(settings.BASE_DIR, 'static/'+figname+'.png'))
fig.clf()
plt.close(fig)
gc.collect()
return
This is likely because you're trying to connect to a (presumably non-existent) X-server when you use matplotlib. If you do have X running on your webserver, you probably still want to avoid using an interactive backend for matplotlib
(Edit: Just saw that you're on windows. Obviously, it's not that mattplotlib is trying to connect to an X-server when run on Windows, but I'd be willing to bet that your problem is still related to using an interactive backend and matplotlib trying to connect to the graphical display.)
If you want to use matplotlib without interactive plots (i.e. without needing an X-server), then you need to explicitly use a non-interactive backend. (e.g. Agg, pdf, etc)
First off, remove from pylab import *. That's a really bad idea for a huge number of reasons (hint, min and max aren't what you think they are, among other things). Also, you don't seem to be using it. You're already accessing matplotlib functionality through the pyplot interface and numpy though the numpy namespace.
Next, before you do import matplotlib.pyplot as plt (or before you do from pylab import * if you decide not to remove it), do:
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot # etc...
Now matplotlib won't try to connect to the X display everytime you make a new figure.
Related
I would like to automatically generate some sort of log of all the database changes that are made via the Django shell in the production environment.
We use schema and data migration scripts to alter the production database and they are version controlled. Therefore if we introduce a bug, it's easy to track it back. But if a developer in the team changes the database via the Django shell which then introduces an issue, at the moment we can only hope that they remember what they did or/and we can find their commands in the Python shell history.
Example. Let's imagine that the following code was executed by a developer in the team via the Python shell:
>>> tm = TeamMembership.objects.get(person=alice)
>>> tm.end_date = date(2022,1,1)
>>> tm.save()
It changes a team membership object in the database. I would like to log this somehow.
I'm aware that there are a bunch of Django packages related to audit logging, but I'm only interested in the changes that are triggered from the Django shell, and I want to log the Python code that updated the data.
So the questions I have in mind:
I can log the statements from IPython but how do I know which one touched the database?
I can listen to the pre_save signal for all model to know if data changes, but how do I know if the source was from the Python shell? How do I know what was the original Python statement?
This solution logs all commands in the session if any database changes were made.
How to detect database changes
Wrap execute_sql of SQLInsertCompiler, SQLUpdateCompiler and SQLDeleteCompiler.
SQLDeleteCompiler.execute_sql returns a cursor wrapper.
from django.db.models.sql.compiler import SQLInsertCompiler, SQLUpdateCompiler, SQLDeleteCompiler
changed = False
def check_changed(func):
def _func(*args, **kwargs):
nonlocal changed
result = func(*args, **kwargs)
if not changed and result:
changed = not hasattr(result, 'cursor') or bool(result.cursor.rowcount)
return result
return _func
SQLInsertCompiler.execute_sql = check_changed(SQLInsertCompiler.execute_sql)
SQLUpdateCompiler.execute_sql = check_changed(SQLUpdateCompiler.execute_sql)
SQLDeleteCompiler.execute_sql = check_changed(SQLDeleteCompiler.execute_sql)
How to log commands made via the Django shell
atexit.register() an exit handler that does readline.write_history_file().
import atexit
import readline
def exit_handler():
filename = 'history.py'
readline.write_history_file(filename)
atexit.register(exit_handler)
IPython
Check whether IPython was used by comparing HistoryAccessor.get_last_session_id().
import atexit
import io
import readline
ipython_last_session_id = None
try:
from IPython.core.history import HistoryAccessor
except ImportError:
pass
else:
ha = HistoryAccessor()
ipython_last_session_id = ha.get_last_session_id()
def exit_handler():
filename = 'history.py'
if ipython_last_session_id and ipython_last_session_id != ha.get_last_session_id():
cmds = '\n'.join(cmd for _, _, cmd in ha.get_range(ha.get_last_session_id()))
with io.open(filename, 'a', encoding='utf-8') as f:
f.write(cmds)
f.write('\n')
else:
readline.write_history_file(filename)
atexit.register(exit_handler)
Put it all together
Add the following in manage.py before execute_from_command_line(sys.argv).
if sys.argv[1] == 'shell':
import atexit
import io
import readline
from django.db.models.sql.compiler import SQLInsertCompiler, SQLUpdateCompiler, SQLDeleteCompiler
changed = False
def check_changed(func):
def _func(*args, **kwargs):
nonlocal changed
result = func(*args, **kwargs)
if not changed and result:
changed = not hasattr(result, 'cursor') or bool(result.cursor.rowcount)
return result
return _func
SQLInsertCompiler.execute_sql = check_changed(SQLInsertCompiler.execute_sql)
SQLUpdateCompiler.execute_sql = check_changed(SQLUpdateCompiler.execute_sql)
SQLDeleteCompiler.execute_sql = check_changed(SQLDeleteCompiler.execute_sql)
ipython_last_session_id = None
try:
from IPython.core.history import HistoryAccessor
except ImportError:
pass
else:
ha = HistoryAccessor()
ipython_last_session_id = ha.get_last_session_id()
def exit_handler():
if changed:
filename = 'history.py'
if ipython_last_session_id and ipython_last_session_id != ha.get_last_session_id():
cmds = '\n'.join(cmd for _, _, cmd in ha.get_range(ha.get_last_session_id()))
with io.open(filename, 'a', encoding='utf-8') as f:
f.write(cmds)
f.write('\n')
else:
readline.write_history_file(filename)
atexit.register(exit_handler)
I would consider something like this:
Wrapping each python session with some sort initialisation code using e.g.
PYTHONSTARTUP environment variable
https://docs.python.org/3/using/cmdline.html#envvar-PYTHONSTARTUP
In the file where PYTHONSTARTUP points to registering Exit handler using atexit
https://docs.python.org/3/library/atexit.html
These two things should allow you to use some lower level APIs of
django-reversion to wrap the whole terminal session with
https://django-reversion.readthedocs.io/en/stable/api.html#creating-revisions (something like this but calling __enter__ and __exit__ of that context manager directly in your startup and atexit code). Unfortunately I don't know the details but it should be doable.
In atexit / revision end calling the code to list the additional lines
of the terminal session and storing them somewhere else in the database with a reference to the specific revision.
See:
https://docs.python.org/3/library/readline.html#readline.get_history_length
https://docs.python.org/3/library/readline.html#readline.get_history_item
Basically, the idea is that you could call get_history_length twice: at the beginning and end of the terminal session. That will allow you to get relevant lines of where the change took place using get_history_item. You may end up having more lines of history than what you actually need but at least there is enough context to see what's going on.
Based on the answer of aaron and the implementation of the built-in IPython magic %logstart, this is the solution we came up with in the end.
All commands of the last IPython session are logged in a history file if any of the commands triggered a database write through the Django ORM.
Here's an excerpt of the generated history file:
❯ cat ~/.python_shell_write_history
# Thu, 27 Jan 2022 16:20:28
#
# New Django shell session started
#
# Thu, 27 Jan 2022 16:20:28
from people.models import *
# Thu, 27 Jan 2022 16:20:28
p = Person.objects.first()
# Thu, 27 Jan 2022 16:20:28
p
#[Out]# <Person: Test Albero Jose Maria>
# Thu, 27 Jan 2022 16:20:28
p.email
#[Out]# 'test-albero-jose-maria#gmail.com'
# Thu, 27 Jan 2022 16:20:28
p.save()
Here's our manage.py now:
#!/usr/bin/env python
import os
import sys
def shell_audit(logfname: str) -> None:
"""If any of the Python shell commands changed the Django database during the
session, capture all commands in a logfile for future analysis."""
import atexit
from django.db.models.sql.compiler import (
SQLDeleteCompiler,
SQLInsertCompiler,
SQLUpdateCompiler,
)
changed = False
def check_changed(func):
def _func(*args, **kwargs):
nonlocal changed
result = func(*args, **kwargs)
if not changed and result:
changed = not hasattr(result, "cursor") or bool(result.cursor.rowcount)
return result
return _func
SQLInsertCompiler.execute_sql = check_changed(SQLInsertCompiler.execute_sql)
SQLUpdateCompiler.execute_sql = check_changed(SQLUpdateCompiler.execute_sql)
SQLDeleteCompiler.execute_sql = check_changed(SQLDeleteCompiler.execute_sql)
def exit_handler():
if not changed:
return None
from IPython.core import getipython
shell = getipython.get_ipython()
if not shell:
return None
logger = shell.logger
# Logic borrowed from %logstart (IPython.core.magics.logging)
loghead = ""
log_session_head = "#\n# New Django shell session started\n#\n"
logmode = "append"
log_output = True
timestamp = True
log_raw_input = False
logger.logstart(logfname, loghead, logmode, log_output, timestamp, log_raw_input)
log_write = logger.log_write
input_hist = shell.history_manager.input_hist_parsed
output_hist = shell.history_manager.output_hist_reprs
log_write(log_session_head)
for n in range(1, len(input_hist)):
log_write(input_hist[n].rstrip() + "\n")
if n in output_hist:
log_write(output_hist[n], kind="output")
atexit.register(exit_handler)
if __name__ == "__main__":
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "project.settings")
try:
from django.core.management import execute_from_command_line
except ImportError:
# The above import may fail for some other reason. Ensure that the
# issue is really that Django is missing to avoid masking other
# exceptions on Python 2.
try:
import django # noqa: F401
except ImportError:
raise ImportError(
"Couldn't import Django. Are you sure it's installed and "
"available on your PYTHONPATH environment variable? Did you "
"forget to activate a virtual environment?"
)
raise
if sys.argv[1] == "shell":
logfname = os.path.expanduser("~/.python_shell_write_history")
shell_audit(logfname)
execute_from_command_line(sys.argv)
You could use django's receiver annotation.
For example, if you want to detect any call of the save method, you could do:
from django.db.models.signals import post_save
from django.dispatch import receiver
import logging
#receiver(post_save)
def logg_save(sender, instance, **kwargs):
logging.debug("whatever you want to log")
some more documentation for the signals
I have a python function that uses ctypes to call some c code
The python code looks like:
import numpy as np
from ctypes import *
dll=cdll.LoadLibrary("./test.so")
def callTestFunction():
out=np.zeros(shape=(10),dtype=np.float32)
dll.testFunction(out.ctypes.data,10)
return out
And the c code looks like
void testFunction(float values[], int l){
for(int i=0;i<l;i++){
values[i]=1;
}
}
This code works fine if I run python and call the function.
But when I import the same code and call the function inside of mod_wsgi, I get a segmentation fault.
I have already set WSGIApplicationGroup %{GLOBAL} and if I capture the segfault in httpd using gdb, it says
#0 0x00007fffc9fa9bad in testFunction (values=0x563e93b0, l=10) at test.c:63
63 values[i]=1;
(gdb) print *values#10
Cannot access memory at address 0x563e93b0
My guess is Apache is enforcing some kind of memory boundary between my python code and my c library? Does anyone have a solution for this, or know of a better way to return an array from c to python inside of mod_wsgi?
Update:
I added print statements to python and c to print out
sizeof(c_void_p), out.ctypes.data, and values.
in ipython
out.ctypes.data 0x23dde40
sizeof(c_void_p) 8
values (in c): 23dde40
in apache
[Sun May 10 17:37:01.647440 2020] [wsgi:error] [pid 7101] [client 127.0.0.1:60346] out.ctypes.data 0x55555645f7c0
[Sun May 10 17:37:01.647592 2020] [wsgi:error] [pid 7101] [client 127.0.0.1:60346] sizeof(c_void_p) 8
...
(gdb) p values
$1 = (float *) 0x5645f7c0
So there is a difference in Apache! 0x55555645f7c0 vs 0x5645f7c0
If I look at the correct memory location in GDB, it looks promising!
(gdb) p *0x55555645f7c0#40
$2 = {0 <repeats 40 times>}
Turns out I need to cast out.ctypes.data as c_void_p!
Corrected python code:
import numpy as np
from ctypes import *
dll=cdll.LoadLibrary("./test.so")
def callTestFunction():
out=np.zeros(shape=(10),dtype=np.float32)
dll.testFunction(c_void_p(out.ctypes.data),c_int(10))
return out
I still have no idea why this works in ipython, but not Apache.
I am developing an app that runs (always - it is a controller for a heat pump system) in python and I use flask to provide a user interface to controll the app.
The flask app has different control items, for instance buttons to turn the system on or off.
I am trying to execute a specific function from a python module in response to a "click" on a button (the final goal is to change a value in an mmap resource that will be read in another module to change the state of the system).
In the flask app I have something like:
#app.route('/cntr_hpauto',methods=['GET','POST'])
#basic_auth.required
def cntr_hpauto():
manage_globals.set_from_web()
return render_template('control.html',cur_hp_mode="auto")
However, this generates an "internal server error'
The complete flask app is (manage_globals is the *.py file I want to import and that contains the function I want to call):
from flask import Flask, request, render_template
from flask_basicauth import BasicAuth
import sys
import os
import mmap
import manage_globals
app = Flask(__name__)
app.config['BASIC_AUTH_USERNAME'] = '***'
app.config['BASIC_AUTH_PASSWORD'] = '***'
basic_auth = BasicAuth(app)
#app.route('/')
def splash():
return render_template('splash.html')
#app.route('/dashboard', methods=['GET','POST'])
#basic_auth.required
def dashboard():
return render_template('dashboard.html')
#app.route('/control',methods=['GET','POST'])
#basic_auth.required
def control():
return render_template('control.html',cur_hp_mode="none")
#app.route('/cntr_hpauto',methods=['GET','POST'])
#basic_auth.required
def cntr_hpauto():
manage_globals.set_from_web()
return render_template('control.html',cur_hp_mode="auto")
#app.route('/cntr_hpon',methods=['GET','POST'])
#basic_auth.required
def cntr_hpon():
return render_template('control.html',cur_hp_mode="on")
#app.route('/cntr_hpoff',methods=['GET','POST'])
#basic_auth.required
def cntr_hpoff():
return render_template('control.html',cur_hp_mode="off")
if __name__ == '__main__':
app.run(ssl_context=('/home/groenhol/certs/groenhol.pem', '/home/groenhol/certs/groenhol.key'))
And the module (example, only writing the map file to a logfile) is:
# 14/08/2017 henk witte / groenholland
# part of geotech project, ann controller dual source heat pump
# this module maintains the global database with mmap
import mmap
""" the mmap file is position dependent!
use readlines and split
line 1: heatpump auto/on/off
line 2: userpump off
line 3: srcselect air
"""
def init_conf_file():
dummy="a"
def set_from_web():
with open("geotech.conf", "r+b") as f:
mm = mmap.mmap(f.fileno(), 0)
for line in iter(mm.readline, b''):
with open("globals.log","ab") as f2:
f2.write(line)
f2.close()
mm.close()
if __name__ == '__main__':
init_conf_file()
The flask app runs fine without the function call, the module I import by itself runs fine as well.
Any help much appreciated!
Henk
As suggested by Kevin Pasquarella I added app.debug = true. However, as the error occurs when apache is loadin the main splash page already (apache server error) this did not help. But I then looked at the apache error log:
[Tue Aug 15 21:33:14.638580 2017] [mpm_event:notice] [pid 959:tid 3067240448] AH00489: Apache/2.4.18 (Ubuntu) OpenSSL/1.0.2g mod_wsgi/4.5.17 Python/3.4 configured -- resuming normal operations
[Tue Aug 15 21:33:14.639152 2017] [core:notice] [pid 959:tid 3067240448] AH00094: Command line: '/usr/sbin/apache2'
[Tue Aug 15 21:33:19.825211 2017] [wsgi:error] [pid 2461:tid 3031819312] [remote 192.168.178.85:9676] mod_wsgi (pid=2461): Target WSGI script '/home/groenhol/py_control/ui/webapp/main.wsgi' cannot be loaded as Python module.
[Tue Aug 15 21:33:19.826502 2017] [wsgi:error] [pid 2461:tid 3031819312] [remote 192.168.178.85:9676] mod_wsgi (pid=2461): Exception occurred processing WSGI script '/home/groenhol/py_control/ui/webapp/main.wsgi'.
[Tue Aug 15 21:33:19.967421 2017] [wsgi:error] [pid 2461:tid 3031819312] [remote 192.168.178.85:9676] Traceback (most recent call last):
[Tue Aug 15 21:33:19.970377 2017] [wsgi:error] [pid 2461:tid 3031819312] [remote 192.168.178.85:9676] File "/home/groenhol/py_control/ui/webapp/main.wsgi", line 4, in <module>
[Tue Aug 15 21:33:19.970581 2017] [wsgi:error] [pid 2461:tid 3031819312] [remote 192.168.178.85:9676] from main import app as application
[Tue Aug 15 21:33:19.971031 2017] [wsgi:error] [pid 2461:tid 3031819312] [remote 192.168.178.85:9676] File "/home/groenhol/py_control/ui/webapp/main.py", line 41
I then searched for mod_wsgi cannot be loaded as python module
Answers indicate there is a difference between the python version I am using (3.4) and the wsgi version.
So I checked the wsgi version in /etc/apache2/mods-enabled/mod-wsgi.load:
LoadModule wsgi_module "/home/groenhol/miniconda3/lib/python3.4/site-packages/mod_wsgi/server/mod_wsgi-py34.cpython-34m.so"
WSGIPythonHome "/home/groenhol/miniconda3"
So seems to use python 3.4 version.
To make sure I use ldd as I found during the search:
groenhol#arm:~/mod_wsgi-4.5.15$ ldd LoadModule wsgi_module "/home/groenhol/miniconda3/lib/python3.4/site-packages/mod_wsgi/server/mod_wsgi-py34.cpython-34m.so"
LoadModule:
ldd: ./LoadModule: No such file or directory
wsgi_module:
ldd: ./wsgi_module: No such file or directory
/home/groenhol/miniconda3/lib/python3.4/site-packages/mod_wsgi/server/mod_wsgi-py34.cpython-34m.so:
linux-vdso.so.1 => (0xbee90000)
libpython3.4m.so.1.0 => /home/groenhol/miniconda3/lib/libpython3.4m.so.1.0 (0xb6d40000)
libpthread.so.0 => /lib/arm-linux-gnueabihf/libpthread.so.0 (0xb6d0f000)
libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0xb6c23000)
/lib/ld-linux-armhf.so.3 (0x7f64d000)
libdl.so.2 => /lib/arm-linux-gnueabihf/libdl.so.2 (0xb6c10000)
libutil.so.1 => /lib/arm-linux-gnueabihf/libutil.so.1 (0xb6bfd000)
libm.so.6 => /lib/arm-linux-gnueabihf/libm.so.6 (0xb6b85000)
libgcc_s.so.1 => /lib/arm-linux-gnueabihf/libgcc_s.so.1 (0xb6b5c000)
groenhol#arm:~/mod_wsgi-4.5.15$ WSGIPythonHome "/home/groenhol/miniconda3"
-bash: WSGIPythonHome: command not found
As far as I can tell (http://modwsgi.readthedocs.io/en/develop/user-guides/checking-your-installation.html#python-shared-library) this seems OK?
Ok, so next step?
The code:
def set_from_web():
with open("geotech.conf", "r+b") as f:
mm = mmap.mmap(f.fileno(), 0)
for line in iter(mm.readline, b''):
with open("globals.log","ab") as f2:
f2.write(line)
f2.close()
mm.close()
is going to be a problem because you are using a relative path name to files.
The current working directory of the process will not be where your code is and also will not be writable to the Apache user. You need to use absolute paths and ensure the Apache user has write permission to files.
See:
http://modwsgi.readthedocs.io/en/develop/user-guides/application-issues.html#application-working-directory
http://modwsgi.readthedocs.io/en/develop/user-guides/application-issues.html#access-rights-of-apache-user
The solution turned out to be pretty trivial: the mod_wsgi does not like if you ident by spaces and tabs. I changed all idents to tabs and then the code runs!
I found this out by changing the code to something very simple, just returning a string and printing that on the web page created by the flask template. Then I could see the wsgi fault in the apache log. In the full code other faults were occuring making it difficult to find out what exactly caused the error.
I also took care of the comment made by Graham Dumpleton (that apache cannot write to the directory), I created a shared directory (/home/py_shared) which I added to the www-data group (both the python user and apache are member of that group). I then set the group of the folder to www-data and used chmod g+w py_shared and chmod g+s py_shared to set the correct permissions.
This topic is discussed on several pages, e.g.:
https://unix.stackexchange.com/questions/154776/create-files-that-both-www-data-and-myuser-can-edit
THANKS for all your suggestions!
as i am not really aware of the underlying strategies or protocols used by Ladon, Webservices and Apache (i am using Ladon and Python with mod_wsgi.so on a Windows Apache server - switched to Ubuntu system)
i wonder if this can be possible to load some ressources for python once, so that exposed methods use these ressources from python code without having to load these ressources again when considering /serving new queries to the web services?
do you have any clue on how to achieve this if possible, or any work around if not ?
typically i am loading some huge dictionaries from files that take too much time to load (I/O) and as it is loaded when receiving each new ladon query, the WS is too slow, i would have like to tell Ladon : "load this when apache start, and made that available to all my python web services/codes as a dictionary during all the time that Apache is running". I will not modify these datas, so i just need to able to read/access them.
best regards
first EDIT : if this could help, looks like on my Ubuntu (i have switched to Ubuntu from my Win config to be more "standard", hope i was right doing this), Apache2 is set in prefork mode rather than MPM, (as suggested by Jakob Simon-Gaarde) readed from :
#: sudo /usr/sbin/apache2 -l
Compiled in modules:
core.c
mod_log_config.c
mod_logio.c
prefork.c
http_core.c
mod_so.c
#: sudo /usr/sbin/apache2 -l | grep MPM
#:
i'm going to check how this can be done, maybe i am also putting some simplified code here, because for now i'm in a noway even with your helpful answers (i can make anything work here :/)
when installing MPM mode, found how to do here: $ sudo apt-get install apache2-mpm-worker
last EDIT:
here is the skeleton of my WS code :
MODEL_DIR = "/home/mydata.file"
import sys
import codecs
import glob
import os
import re
import numpy
from ladon.ladonizer import ladonize
from ladon.types.ladontype import LadonType
from ladon.compat import PORTABLE_STRING
class Singleton(type):
_instances = {}
def __call__(cls, *args, **kwargs):
if cls not in cls._instances:
cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)
return cls._instances[cls]
class LDtest(object):
__metaclass__ = Singleton
modeldir = MODEL_DIR
def __init__(self):
self.load()
def load(self):
modeldir = LDtest.modeldir
self.data = mywrapperfordata.mywrapperfordata(modeldir)
b = datetime.datetime.now()
self.features = self.mywrapperfordata.load() # loading is wrapped here
c = datetime.datetime.now()
print("loading: %s done." % (c-b))
def letsdoit(self, myinput):
return [] # actually main logic ie complex stuff involving accessing to self.features
#ladonize(PORTABLE_STRING, [ PORTABLE_STRING ], rtype = [ PORTABLE_STRING ] )
def ws(self, myinput):
result = self.letsdoit(myinput)
return result
import datetime
a = datetime.datetime.now()
myLDtest = LDtest()
b = datetime.datetime.now()
print("LDtest: %s" % (b-a))
about loading time: from my apache2 log: -notice that module 1 is required and imported by module 2 and also providing as a lonely webservice. It looks like the singleton is not built or not quickly enough?
[Tue Jul 09 11:09:11 2013] [notice] caught SIGTERM, shutting down
[Tue Jul 09 11:09:12 2013] [notice] Apache/2.2.16 (Debian) mod_wsgi/3.3 Python/2.6.6 configured -- resuming normal operations
[Tue Jul 09 11:09:50 2013] [error] Module 4: 0:00:02.885693.
[Tue Jul 09 11:09:51 2013] [error] Module 0: 0:00:03.061020
[Tue Jul 09 11:09:51 2013] [error] Module 1: 0:00:00.026059.
[Tue Jul 09 11:09:51 2013] [error] Module 1: 0:00:00.012517.
[Tue Jul 09 11:09:51 2013] [error] Module 2: 0:00:00.012678.
[Tue Jul 09 11:09:51 2013] [error] Module (dbload): 0:00:00.402387 (22030)
[Tue Jul 09 11:09:54 2013] [error] Module 3: 0:00:00.000036.
[Tue Jul 09 11:13:00 2013] [error] Module 0: 0:00:03.055841
[Tue Jul 09 11:13:01 2013] [error] Module 1: 0:00:00.026215.
[Tue Jul 09 11:13:01 2013] [error] Module 1: 0:00:00.012600.
[Tue Jul 09 11:13:01 2013] [error] Module 2: 0:00:00.012643.
[Tue Jul 09 11:13:01 2013] [error] Module (dbload): 0:00:00.322444 (22030)
[Tue Jul 09 11:13:03 2013] [error] Module 3: 0:00:00.000035.
mod_wsgi launches one or more Python processes upon startup and leaves them running to handle requests. If you load a module or set a global variable, they'll still be there when you handle the next request - however, each Python process has its own separate block of memory, so if you configure mod_wsgi to launch 8 processes and load a 1G dataset, eventually you'll be using 8G of memory. Maybe you should consider using a database?
edit: Thanks Graham :-) So with only one process and multiple threads, you can share one copy of your huge dictionary between all worker threads.
We use Ladon extensively at my work with all our web projects, and I have the priviledge of being able to develop my private project (I am the Ladon developer) and getting payed for it ;-)
Some of our services have very heavy resource consumptions, for instance we have a text-to-speach service that loads around 1Gb of data into memory per supported language, and a wordprediction service that loads around 100Mb per supported language.
mod_wsgi is fine - we use that aswell - What you need to do is make sure that your apache server is compiled as mpm-worker (http://httpd.apache.org/docs/2.2/mod/worker.html). In this configuration your service runs in a multi-threaded environment instead of a multi-process environment. The effect is that you only fire up one interpreter per server process which then runs your service in several underlying threads that share resources. The caveeat is that you have to make sure that your service does not step on it's own toes, meaning you will have to protect global variables and class-static variables shared between service class instances with mutex.acquire()/mutex.release().
Other than that Ladon as a framework is build for multi-threaded environments.
Best regards Jakob Simon-Gaarde
The current backend name is accessible via
>>> import matplotlib.pyplot as plt
>>> plt.get_backend()
'GTKAgg'
Is there a way to get a list of all backends that can be used on a particular machine?
You can access the lists
matplotlib.rcsetup.interactive_bk
matplotlib.rcsetup.non_interactive_bk
matplotlib.rcsetup.all_backends
the third being the concatenation of the former two. If I read the source code correctly, those lists are hard-coded though, and don't tell you what backends are actually usable. There is also
matplotlib.rcsetup.validate_backend(name)
but this also only checks against the hard-coded list.
Here is a modification of the script posted previously. It finds all supported backends, validates them and measures their fps. On OSX it crashes python when it comes to tkAgg, so use at your own risk ;)
from __future__ import print_function, division, absolute_import
from pylab import *
import time
import matplotlib.backends
import matplotlib.pyplot as p
import os.path
def is_backend_module(fname):
"""Identifies if a filename is a matplotlib backend module"""
return fname.startswith('backend_') and fname.endswith('.py')
def backend_fname_formatter(fname):
"""Removes the extension of the given filename, then takes away the leading 'backend_'."""
return os.path.splitext(fname)[0][8:]
# get the directory where the backends live
backends_dir = os.path.dirname(matplotlib.backends.__file__)
# filter all files in that directory to identify all files which provide a backend
backend_fnames = filter(is_backend_module, os.listdir(backends_dir))
backends = [backend_fname_formatter(fname) for fname in backend_fnames]
print("supported backends: \t" + str(backends))
# validate backends
backends_valid = []
for b in backends:
try:
p.switch_backend(b)
backends_valid += [b]
except:
continue
print("valid backends: \t" + str(backends_valid))
# try backends performance
for b in backends_valid:
ion()
try:
p.switch_backend(b)
clf()
tstart = time.time() # for profiling
x = arange(0,2*pi,0.01) # x-array
line, = plot(x,sin(x))
for i in arange(1,200):
line.set_ydata(sin(x+i/10.0)) # update the data
draw() # redraw the canvas
print(b + ' FPS: \t' , 200/(time.time()-tstart))
ioff()
except:
print(b + " error :(")
To just see supported interactive backends see:
#!/usr/bin/env python
from __future__ import print_function
import matplotlib.pyplot as plt
import matplotlib
backends = matplotlib.rcsetup.interactive_bk
# validate backends
backends_valid = []
for b in backends:
try:
plt.switch_backend(b)
backends_valid += [b]
except:
continue
print(backends_valid)
You can pretend to put a wrong backend argument, then it will return you a ValueError with the list of valid matplotlib backends, like this:
Input:
import matplotlib
matplotlib.use('WRONG_ARG')
Output:
ValueError: Unrecognized backend string 'test': valid strings are ['GTK3Agg', 'GTK3Cairo', 'MacOSX', 'nbAgg', 'Qt4Agg', 'Qt4Cairo', 'Qt5Agg', 'Qt
5Cairo', 'TkAgg', 'TkCairo', 'WebAgg', 'WX', 'WXAgg', 'WXCairo', 'agg', 'cairo', 'pdf', 'pgf', 'ps', 'svg', 'template']
There is the hard-coded list mentioned by Sven, but to find every backend which Matplotlib can use (based on the current implementation for setting up a backend) the matplotlib/backends folder can be inspected.
The following code does this:
import matplotlib.backends
import os.path
def is_backend_module(fname):
"""Identifies if a filename is a matplotlib backend module"""
return fname.startswith('backend_') and fname.endswith('.py')
def backend_fname_formatter(fname):
"""Removes the extension of the given filename, then takes away the leading 'backend_'."""
return os.path.splitext(fname)[0][8:]
# get the directory where the backends live
backends_dir = os.path.dirname(matplotlib.backends.__file__)
# filter all files in that directory to identify all files which provide a backend
backend_fnames = filter(is_backend_module, os.listdir(backends_dir))
backends = [backend_fname_formatter(fname) for fname in backend_fnames]
print backends
You can also see some documentation for a few backends here:
http://matplotlib.org/api/index_backend_api.html
the pages lists just a few backends, some of them don't have a proper documentation:
matplotlib.backend_bases
matplotlib.backends.backend_gtkagg
matplotlib.backends.backend_qt4agg
matplotlib.backends.backend_wxagg
matplotlib.backends.backend_pdf
matplotlib.dviread
matplotlib.type1font
What about this?
%matplotlib --list
Available matplotlib backends: ['tk', 'gtk', 'gtk3', 'wx', 'qt4', 'qt5', 'qt', 'osx', 'nbagg', 'notebook', 'agg', 'svg', 'pdf', 'ps', 'inline', 'ipympl', 'widget']
You could look at the following folder for a list of possible backends...
/Library/Python/2.6/site-packages/matplotlib/backends
/usr/lib64/Python2.6/site-packages/matplotlib/backends