Why does my Python subprocess error when managed by supervisord?

Why does my Python subprocess error when managed by supervisord? - python

I wrote a small API for screen captures - a flask app that passes capture requests to CasperJS using subprocess.Popen
In development on a Mac, and when I run the server in the shell on my production server which is Ubuntu 13.04 - everything works great.
When I manage the server with supervisord, however, the subprocess call returns an error that CasperJS cannot find PhantomJS (Casper runs on Phantom).
The error thrown is:
Fatal: [Errno 2] No such file or directory; did you install phantomjs?
The code is all open source.
Here is the subprocess call:
https://github.com/pwalsh/moment/blob/master/moment/models.py#L215
Here is the supervisor conf file for the server (I generate the actual file with Fabric, but it should be clear):
https://github.com/pwalsh/moment/blob/master/fabfile/templates.py#L56
There are only two users on the system - root, and my app's user. When I log on the machine as either of these users, I can run a dev server successfully, and I can run PhantomJS and CasperJS successfully.
Why does my subprocess error with supervisord?
Edit: Adding code + stacktrace
Supervisord conf for the gunicorn server:
; Generated via Fabric on 2013-08-18 23:05:50.928087
; gunicorn configuration for Moment
[program:moment-gunicorn]
command=/srv/environments/moment/bin/gunicorn moment:app --bind 127.0.0.1:9000 --workers 4 --timeout 30 --access-logfile /srv/logs/moment_gunicorn_access.log --error-logfile /srv/logs/moment_gunicorn_error.log
environment=PATH="/srv/environments/moment/bin"
directory=/srv/projects/moment
user=moment
autostart=true
autorestart=true
The code that sends data to the CasperJS/PhantomJS subprocess. It is a method of a class, the full code is here:
def capture(self):
filename = '{key}.{format}'.format(key=self.get_key().lstrip(self.prefix),
format=self.arguments['format'])
image = os.path.join(conf.CAPTURES_ROOT, filename)
params = [conf.CASPER, conf.CAPTURE_SCRIPT, self.arguments['url'],
image, self.arguments['viewport'], self.arguments['target']]
casper = subprocess.Popen(params, stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
casper_output, casper_errors = casper.communicate()
logging.info(casper_output)
logging.info(casper_errors)
logging.info(casper.returncode)
# Here we are relying on convention:
# If success, subprocess.returncode == 0
# This could be fragile, need to investigate.
if casper.returncode:
raise Exception(casper_errors)
else:
return image
Traceback:
WARNING:root:Fatal: [Errno 2] No such file or directory; did you install phantomjs?
WARNING:root:
WARNING:root:1
ERROR:moment:Exception on /capture/ [GET]
Traceback (most recent call last):
File "/srv/environments/moment/local/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
response = self.full_dispatch_request()
File "/srv/environments/moment/local/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/srv/environments/moment/local/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/srv/environments/moment/local/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
rv = self.dispatch_request()
File "/srv/environments/moment/local/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/srv/projects/moment/moment/views.py", line 45, in get_capture
image = capture.capture()
File "/srv/projects/moment/moment/models.py", line 229, in capture
raise Exception(casper_errors)
Exception
Note:
I am running in a virtualenv called "moment", and under a user called "moment".
The error is in the casper_output variable - those first three warning are the warning I log when I start the subprocess
I note that those warning are raised by root - I'd have expected them to be raised by "moment", the user that the supervisord process is supposed to run as

Although (for reasons you should investigate) the user has escalated from the original moment user to root, this does not mean that the process has the environment present when you login to a shell as root.
The chances are that your path is only that which is set in your supervisord.conf, and that's why phantomjs appears to be absent.
Environments are not looked up in some database of users; instead they are constructed by explicitly setting values (e.g. using a script) or by inheritance from the spawning process. In this case, you are inheriting from supervisor, and will get whatever environment supervisor had. If supervisor was run by something like cron, that environment will be empty.
Best practice in relation to supervisor is either to run it using a wrapper script which sets up the environment correctly, or just to explicitly set everything in the supervisord.conf. I generally recommend the latter, unless you have a common set of environment fixups in a file used by a lot of scripts (such as because you want it to run inside a virtualenv).

Related

Flask: Address already in use, regardless of port

so I'm pretty new to Flask and i've been following a tutorial but I have been getting a strange problem. I have the following code, a python3 script, launched in atom with the hydrogen package enabled and jupyter. I have the virtual environment running in the background in a terminal.
from flask import Flask
app = Flask(__name__)
if __name__ == '__main__':
app.run(debug=True)
So obviously the script is very basic and it's just used to see if i can connect to localhost, however when running it (through Hydrogen) I get the following error:
Serving Flask app 'main'
Debug mode: on
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
Running on http://127.0.0.1:5000
Press CTRL+C to quit
Restarting with stat
Traceback (most recent call last):
File "/home/uses12/.local/lib/python3.10/site-packages/ipykernel_launcher.py", line 17, in
app.launch_new_instance()
File "/home/uses12/.local/lib/python3.10/site-packages/traitlets/config/application.py", line 977, in launch_instance
app.initialize(argv)
File "/home/uses12/.local/lib/python3.10/site-packages/traitlets/config/application.py", line 110, in inner
return method(app, *args, **kwargs)
File "/home/uses12/.local/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 666, in initialize
self.init_sockets()
File "/home/uses12/.local/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 307, in init_sockets
self.shell_port = self._bind_socket(self.shell_socket, self.shell_port)
File "/home/uses12/.local/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 244, in _bind_socket
return self._try_bind_socket(s, port)
File "/home/uses12/.local/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 220, in _try_bind_socket
s.bind("tcp://%s:%i" % (self.ip, port))
File "/home/uses12/.local/lib/python3.10/site-packages/zmq/sugar/socket.py", line 232, in bind
super().bind(addr)
File "zmq/backend/cython/socket.pyx", line 568, in zmq.backend.cython.socket.Socket.bind
File "zmq/backend/cython/checkrc.pxd", line 28, in zmq.backend.cython.checkrc._check_rc
zmq.error.ZMQError: Address already in use
Upon looking online, a possible solution was to specify the port in the app. parameters, however no matter what port I specify it ALWAYS says that the address is already in use, leading me to believe that there is some deeper issue. I've done a check to see what services are using those ports and it's always just two instances of python3, I guess because Flask is set to debug. I'm completely lost. Can anyone help? Thanks so much.

"Lambda function for the route not found" error running Golang function locally with SAM?

I'm attempting to write a Lambda function in Golang (doing development on my MacBook with Docker Desktop running) following this quick-start. I perform the following steps:
Ran "sam init --runtime go1.x --name testing" to generate a function from the template.
Ran "make deps" to go get dependencies.
Ran "make build" to build the code that was generated to hello-world/hello-word.
Ran "sam local start-api".
Everything looks like it starts correctly:
Mounting HelloWorldFunction at http://127.0.0.1:3000/hello [GET]
You can now browse to the above endpoints to invoke your functions. You do not need to restart/reload SAM CLI while working on your functions, changes will be reflected instantly/automatically. You only need to restart SAM CLI if you update your AWS SAM template
2019-11-16 10:39:19 * Running on http://127.0.0.1:3000/ (Press CTRL+C to quit)
But when I curl the endpoint, I get a 502 and I see an error:
2019-11-16 10:39:23 Exception on /hello [HEAD]
Traceback (most recent call last):
File "/usr/local/Cellar/aws-sam-cli/0.31.0/libexec/lib/python3.7/site-packages/flask/app.py", line 2317, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/Cellar/aws-sam-cli/0.31.0/libexec/lib/python3.7/site-packages/flask/app.py", line 1840, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/Cellar/aws-sam-cli/0.31.0/libexec/lib/python3.7/site-packages/flask/app.py", line 1743, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/Cellar/aws-sam-cli/0.31.0/libexec/lib/python3.7/site-packages/flask/_compat.py", line 36, in reraise
raise value
File "/usr/local/Cellar/aws-sam-cli/0.31.0/libexec/lib/python3.7/site-packages/flask/app.py", line 1838, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/Cellar/aws-sam-cli/0.31.0/libexec/lib/python3.7/site-packages/flask/app.py", line 1824, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/usr/local/Cellar/aws-sam-cli/0.31.0/libexec/lib/python3.7/site-packages/samcli/local/apigw/local_apigw_service.py", line 172, in _request_handler
route = self._get_current_route(request)
File "/usr/local/Cellar/aws-sam-cli/0.31.0/libexec/lib/python3.7/site-packages/samcli/local/apigw/local_apigw_service.py", line 236, in _get_current_route
raise KeyError("Lambda function for the route not found")
KeyError: 'Lambda function for the route not found'
2019-11-16 10:39:23 127.0.0.1 - - [16/Nov/2019 10:39:23] "HEAD /hello HTTP/1.1" 502 -
What am I missing?

Late to the party on this one and you've probably resolved it already. Without seeing any code, it looks like you've made a HEAD request, rather than a GET request and I suspect you have no route for the HEAD method, only GET.

How to configure webhook in Pybossa

The Pybossa didn't describe how to configure webhook.
I met some issue when I'm configuring webhook, below is my steps:
fork pybossa webhook example
Run webhook with default settings(modified api_key and endpoint).
In Pybossa, modify the project and add webhook to point to webhook running URL.
Open a command line window and execute the following command:
# rqworker high
Then when a task is completed, I see there are logs in command line window. which is complaining the following I get the below error:
14:06:11 *** Listening on high...
14:07:42 high: pybossa.jobs.webhook(u'http://192.168.116.135:5001', {'project_short_name': u'tw', 'task_id': 172, 'fired_at': '2017-08-10 06:07:42', 'project_id': 17, 'result_id': 75, 'event': 'task_completed'}) (e435386c-615d-4525-a65d-f08f0afd2351)
14:07:44 UnboundLocalError: local variable 'project' referenced before assignment
Traceback (most recent call last):
File "/home/baib2/Desktop/pybossa_server/env/local/lib/python2.7/site-packages/rq/worker.py", line 479, in perform_job
rv = job.perform()
File "/home/baib2/Desktop/pybossa_server/env/local/lib/python2.7/site-packages/rq/job.py", line 466, in perform
self._result = self.func(*self.args, **self.kwargs)
File "./pybossa/jobs.py", line 525, in webhook
if project.published and webhook.response_status_code != 200 and current_app.config.get('ADMINS'):
UnboundLocalError: local variable 'project' referenced before assignment
I'm not sure if we should execute the following command
# rqworker high
But if this rqworker not running, I don't see any component picking up work from the redis queue.

You need to run a specific worker, not the default one from PYBOSSA. Just use https://github.com/Scifabric/pybossa/blob/master/app_context_rqworker.py to run it like this:
python app_context_rqworker.py high
This will set up the Flask context, and it will run properly ;-)
We're in the middle of improving our docs, so this should be better in the coming months.

General SSH error - Error reading SSH protocol banner

I'm writing a python UI in kivy to manage some remote machines with fabric. As I can't use fabric's parallel implementation on Windows 10 (see here), I was hoping to use parallel-ssh to actually perform the parallel remote operations. This issue seems to be caused by the interactions between the libraries, rather than an issue with any single one of them.
I've tried manually loading my private key as suggested here:
from fabric.api import execute
import pssh
from pssh.utils import load_private_key
hosts = ['192.168.0.2']
private_key = load_private_key('C:/Users/democracy/.ssh/id_rsa')
pssh_client = pssh.ParallelSSHClient(hosts, user='XXX', password='YYY', pkey=private_key)
output = pssh_client.run_command('whoami', sudo=True)
pssh_client.join(output)
for host in output:
for line in output[host]['stdout']:
print("Host %s - output: %s" % (host, line))
The above code results in the following backtrace:
Exception: Error reading SSH protocol banner('This operation would block forever', <Hub at 0x242566ab9c8 select pending=0 ref=0>)
Traceback (most recent call last):
File "C:\environments\democracy\lib\site-packages\paramiko\transport.py", line 1884, in _check_banner
buf = self.packetizer.readline(timeout)
File "C:\environments\democracy\lib\site-packages\paramiko\packet.py", line 331, in readline
buf += self._read_timeout(timeout)
File "C:\environments\democracy\lib\site-packages\paramiko\packet.py", line 485, in _read_timeout
x = self.__socket.recv(128)
File "C:\environments\democracy\lib\site-packages\gevent\_socket3.py", line 317, in recv
self._wait(self._read_event)
File "C:\environments\democracy\lib\site-packages\gevent\_socket3.py", line 144, in _wait
self.hub.wait(watcher)
File "C:\environments\democracy\lib\site-packages\gevent\hub.py", line 630, in wait
result = waiter.get()
File "C:\environments\democracy\lib\site-packages\gevent\hub.py", line 878, in get
return self.hub.switch()
File "C:\environments\democracy\lib\site-packages\gevent\hub.py", line 609, in switch
return greenlet.switch(self)
gevent.hub.LoopExit: ('This operation would block forever', <Hub at 0x242566ab9c8 select pending=0 ref=0>)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\environments\democracy\lib\site-packages\paramiko\transport.py", line 1740, in run
self._check_banner()
File "C:\environments\democracy\lib\site-packages\paramiko\transport.py", line 1888, in _check_banner
raise SSHException('Error reading SSH protocol banner' + str(e))
paramiko.ssh_exception.SSHException: Error reading SSH protocol banner('This operation would block forever', <Hub at 0x242566ab9c8 select pending=0 ref=0>)
General SSH error - Error reading SSH protocol banner('This operation would block forever', <Hub at 0x242566ab9c8 select pending=0 ref=0>)
The above code works if I import pssh before fabric. Unfortunately, it seems if I do this, any buttons on my kivy interface (which kick off any operations in a background thread) block forever on press. If I go to the console after a button press and send a keyboard interrupt, kivy stops blocking and begins cleanup, but executes the command from the button press before exiting. The stacktrace on sending this interrupt is below:
[INFO ] [Base ] Leaving application in progress...
Traceback (most recent call last):
File "machine_control_ui.py", line 7, in <module>
DemocracyControllerApp().run()
File "C:\environments\democracy\lib\site-packages\kivy\app.py", line 828, in run
runTouchApp()
File "C:\environments\democracy\lib\site-packages\kivy\base.py", line 504, in runTouchApp
EventLoop.window.mainloop()
File "C:\environments\democracy\lib\site-packages\kivy\core\window\window_sdl2.py", line 659, in mainloop
self._mainloop()
File "C:\environments\democracy\lib\site-packages\kivy\core\window\window_sdl2.py", line 405, in _mainloop
EventLoop.idle()
File "C:\environments\democracy\lib\site-packages\kivy\base.py", line 339, in idle
Clock.tick()
File "C:\environments\democracy\lib\site-packages\kivy\clock.py", line 553, in tick
current = self.idle()
File "C:\environments\democracy\lib\site-packages\kivy\clock.py", line 533, in idle
usleep(1000000 * sleeptime)
File "C:\environments\democracy\lib\site-packages\kivy\clock.py", line 717, in usleep
_usleep(microseconds, self._sleep_obj)
File "C:\environments\democracy\lib\site-packages\kivy\clock.py", line 395, in _usleep
_kernel32.WaitForSingleObject(obj, 0xffffffff)
KeyboardInterrupt
*** BUTTON PRESS OPERATION OUTPUTS HERE ***
```
Any insight into why this might be happening and how I can avoid it would be much appreciated. I could potentially investigate other parallel ssh solutions (although I imagine anything using paramiko would have the same issue), or manually kick off a thread per host to achieve the parallel operation otherwise (which probably has its own list of headaches), but I'd prefer to just use the parallel-ssh library if there's a workable solution.
I'm using parallel-ssh 0.92.2 on Python 3 and Windows 10.

From docs -
parallel-ssh uses gevent’s monkey patching to enable asynchronous use
of the Python standard library’s network I/O.
Make sure that ParallelSSH imports come before any other imports in
your code. Otherwise, patching may not be done before the standard
library is loaded which will then cause ParallelSSH to block.
If you are seeing messages like This operation would block forever,
this is the cause.
Monkey patching is only done for the clients under pssh.pssh_client
and pssh.ssh_client for parallel and single host clients respectively.
New native library based clients under pssh.pssh2_client and
pssh.ssh2_client do not perform monkey patching and are an option if
monkey patching is not suitable. These clients will become the default
in a future major release - 2.0.0.
Since monkey patching is used for the client you are using, other uses of the threading, socket etc modules in your application will also have been patched to use gevent which means they no longer run in a native thread but in a co-routine/greenlet.
This is the reason your background thread operations block as they run in a greenlet on the same thread rather than a new thread.
As of 1.2.0, a new client based on libssh2 instead of paramiko is available which does not use monkey patching:
from pssh.pssh2_client import ParallelSSHClient
<..>
Rest of your application can then use the standard library as-is.

How to debug intermittent errors from Django app served with gunicorn (possible race condition)?

I have a Django app being served with nginx+gunicorn with 3 gunicorn worker processes. Occasionally (maybe once every 100 requests or so) one of the worker processes gets into a state where it starts failing most (but not all) requests that it serves, and then it throws an exception when it tries to email me about it. The gunicorn error logs look like this:
[2015-04-29 10:41:39 +0000] [20833] [ERROR] Error handling request
Traceback (most recent call last):
File "/home/django/virtualenvs/homestead_django/local/lib/python2.7/site-packages/gunicorn/workers/sync.py", line 130, in handle
File "/home/django/virtualenvs/homestead_django/local/lib/python2.7/site-packages/gunicorn/workers/sync.py", line 171, in handle_request
File "/home/django/virtualenvs/homestead_django/local/lib/python2.7/site-packages/django/core/handlers/wsgi.py", line 206, in __call__
File "/home/django/virtualenvs/homestead_django/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 196, in get_response
File "/home/django/virtualenvs/homestead_django/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 226, in handle_uncaught_exception
File "/usr/lib/python2.7/logging/__init__.py", line 1178, in error
File "/usr/lib/python2.7/logging/__init__.py", line 1271, in _log
File "/usr/lib/python2.7/logging/__init__.py", line 1281, in handle
File "/usr/lib/python2.7/logging/__init__.py", line 1321, in callHandlers
File "/usr/lib/python2.7/logging/__init__.py", line 749, in handle
File "/home/django/virtualenvs/homestead_django/local/lib/python2.7/site-packages/django/utils/log.py", line 122, in emit
File "/home/django/virtualenvs/homestead_django/local/lib/python2.7/site-packages/django/utils/log.py", line 125, in connection
File "/home/django/virtualenvs/homestead_django/local/lib/python2.7/site-packages/django/core/mail/__init__.py", line 29, in get_connection
File "/home/django/virtualenvs/homestead_django/local/lib/python2.7/site-packages/django/utils/module_loading.py", line 26, in import_by_path
File "/home/django/virtualenvs/homestead_django/local/lib/python2.7/site-packages/django/utils/module_loading.py", line 21, in import_by_path
File "/home/django/virtualenvs/homestead_django/local/lib/python2.7/site-packages/django/utils/importlib.py", line 40, in import_module
ImproperlyConfigured: Error importing module django.core.mail.backends.smtp: "No module named smtp"
So some uncaught exception is happening and then Django is trying to email me about it. The fact that it can't import django.core.mail.backends.smtp doesn't make sense because django.core.mail.backends.smtp should definitely be on the worker process' Python path. I can import it just fine from a manage.py shell and I do get emails for other server errors (actual software bugs) so I know that works. It's like the the worker process' environment is corrupted somehow.
Once a worker process enters this state it has a really hard time recovering; almost every request it serves ends up failing in this same manner. If I restart gunicorn everything is good (until another worker process falls into this weird state again).
I don't notice any obvious patterns so I don't think this is being triggered by a bug in my app (the URLs error'ing out are different, etc). It seems like some sort of race condition.
Currently I'm using gunicorn's --max-requests option to mitigate this problem but I'd like to understand what's going on here. Is this a race condition? How can I debug this?

I suggest you use Sentry which gives a smart way of handling errors.
You can use it as a cloud based solution (getsentry) or you can install it on your own server (github).
Before I was using django core log mailer now I always use sentry.
I do not work at Sentry but their solution is pretty awesome !

We discovered one particular view that was pegging the CPU for a few seconds every time it was loaded that seemed to be triggering this issue. I still don't understand how slamming a gunicorn worker could result in a corrupted execution environment, but fixing the high-CPU view seems to have gotten rid of this issue.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why does my Python subprocess error when managed by supervisord? - python

Related

Flask: Address already in use, regardless of port

"Lambda function for the route not found" error running Golang function locally with SAM?

How to configure webhook in Pybossa

General SSH error - Error reading SSH protocol banner

How to debug intermittent errors from Django app served with gunicorn (possible race condition)?

Categories

Resources