tl;dr: An app that had been working fine is suddenly throwing a "Bad file descriptor" error with no other changes; I need advice for how to evaluate this.
I inherited an app that had been untouched for years, after the server crashed and I needed to move it to another machine. It's built with Flask, and uses Peewee to talk to a Postgres database over pyscopg2. It has a bunch of other stuff--an Elasticsearch engine for searching, a lot of heavy JS on the front end--but that doesn't seem to be the problem here. The code is moderately complex, and I am not very knowledgeable about all of its pieces.
It took me a while to get it set up using the sketchy deployment instructions that had been left behind, but eventually I got it running, and was able to get a test version running on a clean VM and then deploy it on an actual server, using gunicorn and nginx. It's been working fine in production for a week. I'm using Debian Buster for all versions. I'm using the most recent versions of all software.
I then decided to do some basic code cleanup, and ran the entire app through a linter, before looking at some other changes to make, that the end user had requested. Unfortunately, after this, the app consistently fails at the same point with a "Bad file descriptor" error. This is in a pre-run section, which parses a large XML file and saves the info to the database and to Elasticsearch; the app receives an XML upload, forks a few processes, and runs the parse/index process in the background.
I am subsequently unable to get past this error by any means. I have launched a clean VM and installed everything from scratch; I've reverted the git repo to before I linted the code. Same problem. I don't see how it can be a code issue, as it's now at the same point it was when I started. But I'm at a loss for what to do, and terrified that the production machine will fail.
The errors I get (trimming the first few lines that refer to places in the app itself) are:
[2021-03-14 14:40:11.699837] self.execute()
[2021-03-14 14:40:11.699878] File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/peewee.py", line 1906, in inner
[2021-03-14 14:40:11.699907] return method(self, database, *args, **kwargs)
[2021-03-14 14:40:11.699946] File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/peewee.py", line 1977, in execute
[2021-03-14 14:40:11.699976] return self._execute(database)
[2021-03-14 14:40:11.700004] File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/peewee.py", line 2149, in _execute
[2021-03-14 14:40:11.700032] cursor = database.execute(self)
[2021-03-14 14:40:11.700060] File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/peewee.py", line 3156, in execute
[2021-03-14 14:40:11.700088] return self.execute_sql(sql, params, commit=commit)
[2021-03-14 14:40:11.700115] File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/peewee.py", line 3150, in execute_sql
[2021-03-14 14:40:11.700143] self.commit()
[2021-03-14 14:40:11.700171] File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/peewee.py", line 2916, in __exit__
[2021-03-14 14:40:11.700198] reraise(new_type, new_type(exc_value, *exc_args), traceback)
[2021-03-14 14:40:11.700226] File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/peewee.py", line 190, in reraise
[2021-03-14 14:40:11.700254] raise value.with_traceback(tb)
[2021-03-14 14:40:11.700282] File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/peewee.py", line 3143, in execute_sql
[2021-03-14 14:40:11.700309] cursor.execute(sql, params or ())
[2021-03-14 14:40:11.700339] OperationalError('SSL SYSCALL error: Bad file descriptor\n')
127.0.0.1 - - [14/Mar/2021 10:40:11] "POST /manage/versions/upload HTTP/1.1" 500 -
Error on request:
Traceback (most recent call last):
File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/werkzeug/serving.py", line 323, in run_wsgi
execute(self.server.app)
File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/werkzeug/serving.py", line 315, in execute
write(data)
File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/werkzeug/serving.py", line 273, in write
self.send_response(code, msg)
File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/werkzeug/serving.py", line 388, in send_response
self.wfile.write(hdr.encode("ascii"))
File "/usr/lib/python3.7/socketserver.py", line 799, in write
self._sock.sendall(b)
OSError: [Errno 9] Bad file descriptor
Exception in thread Thread-22:
Traceback (most recent call last):
File "/usr/lib/python3.7/threading.py", line 917, in _bootstrap_inner
self.run()
File "/usr/lib/python3.7/threading.py", line 865, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.7/socketserver.py", line 654, in process_request_thread
self.shutdown_request(request)
File "/usr/lib/python3.7/socketserver.py", line 509, in shutdown_request
self.close_request(request)
File "/usr/lib/python3.7/socketserver.py", line 513, in close_request
request.close()
File "/usr/lib/python3.7/socket.py", line 420, in close
self._real_close()
File "/usr/lib/python3.7/socket.py", line 414, in _real_close
_ss.close(self)
OSError: [Errno 9] Bad file descriptor
I note that the final section ("Exception in thread Thread-22") is showing the system Python, rather than my virtual environment; I don't know if that's relevant, or if that's just what's running some overall process. I didn't get to this point doing anything different, though--the app is running in the virtual environment.
I'd be very grateful for any thoughts here--I'm obviously hoping it's some kind of stupid permission error or something, as I can't easily go into the code because of its complexity.
Related
I am using Celery and Kafka to run some jobs in order to push data to Kafka. I also use Faust to connect the workers. But unfortunately, I got an error after running faust -A project.streams.app worker -l info in order to run the pipeline. I wonder if anyone can help me.
/home/admin/.local/lib/python3.6/site-packages/faust/fixups/django.py:71: UserWarning: Using settings.DEBUG leads to a memory leak, never
use this setting in production environments!
warnings.warn(WARN_DEBUG_ENABLED)
Command raised exception: ModuleNotFoundError("'kafka' is not a valid name. Did you mean one of aiokafka, kafka?",)
File "/home/admin/.local/lib/python3.6/site-packages/mode/worker.py", line 67, in exiting
yield
File "/home/admin/.local/lib/python3.6/site-packages/faust/cli/base.py", line 528, in _inner
cmd()
File "/home/admin/.local/lib/python3.6/site-packages/faust/cli/base.py", line 611, in __call__
self.run_using_worker(*args, **kwargs)
File "/home/admin/.local/lib/python3.6/site-packages/faust/cli/base.py", line 620, in run_using_worker
self.on_worker_created(worker)
File "/home/admin/.local/lib/python3.6/site-packages/faust/cli/worker.py", line 57, in on_worker_created
self.say(self.banner(worker))
File "/home/admin/.local/lib/python3.6/site-packages/faust/cli/worker.py", line 97, in banner
self._banner_data(worker))
File "/home/admin/.local/lib/python3.6/site-packages/faust/cli/worker.py", line 127, in _banner_data
(' transport', app.transport.driver_version),
File "/home/admin/.local/lib/python3.6/site-packages/faust/app/base.py", line 1831, in transport
self._transport = self._new_transport()
File "/home/admin/.local/lib/python3.6/site-packages/faust/app/base.py", line 1686, in _new_transport
return transport.by_url(self.conf.broker_consumer[0])(
File "/home/admin/.local/lib/python3.6/site-packages/mode/utils/imports.py", line 101, in by_url
return self.by_name(URL(url).scheme)
File "/home/admin/.local/lib/python3.6/site-packages/mode/utils/imports.py", line 115, in by_name
f'{name!r} is not a valid name. {alt}') from exc
I don't know what was wrong with Faust but I run pip install faust by chance and it solved the problem.
Need some help! While running the python script using Rabbit MQ RPC. I am getting a Socket 104,Socket closed when connection was open error. Below is python traceback and some code:
Traceback (most recent call last):
File "./server.py", line 34, in <module>
channel.start_consuming()
File "/usr/lib/python2.6/site-packages/pika/adapters/blocking_connection.py", line 1681, in start_consuming
self.connection.process_data_events(time_limit=None)
File "/usr/lib/python2.6/site-packages/pika/adapters/blocking_connection.py", line 656, in process_data_events
self._dispatch_channel_events()
File "/usr/lib/python2.6/site-packages/pika/adapters/blocking_connection.py", line 469, in _dispatch_channel_events
impl_channel._get_cookie()._dispatch_events()
File "/usr/lib/python2.6/site-packages/pika/adapters/blocking_connection.py", line 1310, in _dispatch_events
evt.body)
File "./server.py", line 30, in on_request
body=json.dumps(DEVICE_INFO))
File "/usr/lib/python2.6/site-packages/pika/adapters/blocking_connection.py", line 1978, in basic_publish
mandatory, immediate)
File "/usr/lib/python2.6/site-packages/pika/adapters/blocking_connection.py", line 2065, in publish
self._flush_output()
File "/usr/lib/python2.6/site-packages/pika/adapters/blocking_connection.py", line 1174, in _flush_output
*waiters)
File "/usr/lib/python2.6/site-packages/pika/adapters/blocking_connection.py", line 395, in _flush_output
raise exceptions.ConnectionClosed()
pika.exceptions.ConnectionClosed
Apologies as i am unable to comment due to low reputation. Could you provide a little more information on how you are opening your connection. Is it really open?
It might be because of loss of connection with rabbitmq server as pika doesn't deal with disconnects and often results in similar stacktrace.
I also had similar problem, in my case it was because my pika connection was dropping after sometime and my colleague was able to deal with this by adding a wait time for mq:port_number.
We were using docker container so we added following line to our invoke.sh to wait for mq:
filename.py --wait-secs 30 --port-wait mq:5672
I hope you are able to resolve this after doing that.
Otherwise it would be better to check if the connection is being dropped by pika before your python script runs or providing more information on how you are invoking it.
I have a test suite with about 300 test cases. These test cases are HTTP API Calls. All are 'GET' API calls. The initial test cases execute fine. But towards the end of the execution, the error "error: [Errno 24] Too many open files" is thrown. This causes the test cases to fail even though we do not have any functional issue in the system under test.
How to fix this issue?
The error in the report is:
error: [Errno 24] Too many open files
08:55:00.484 DEBUG Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/HttpLibrary/__init__.py", line 229, in GET
self.app.get(path, {}, self.context.request_headers)
File "/Library/Python/2.7/site-packages/webtest/app.py", line 286, in get
File "/Library/Python/2.7/site-packages/HttpLibrary/livetest.py", line 153, in do_request
File "/Library/Python/2.7/site-packages/HttpLibrary/livetest.py", line 126, in _do_httplib_request
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 973, in request
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1007, in _send_request
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 969, in endheaders
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 829, in _send_output
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 791, in send
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 772, in connect
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 571, in create_connection
You have to increase the number of maximum open files on your machine.
Here is an article how this can be accomplished in ubuntu.
Similar problem is answered on SO.
Increasing number of max files open (like proposed by Hinata) is indeed a short-term solution, but I am surprised that you hit this limit in the first place. I don't see why successive GET would open so many files.
My recommendation would be to try out another library to check if this problem remains. You can try out Robot Framework Requests one or you can directly call Requests Python lib. You might want to take a look at a short blog post about this topic I wrote.
For some reason ReplicaSet's Monitor appeared dead when refresh was scheduled.
I've got following Traceback in a call to find_one():
File "pymongo/collection.py", line 604, in find_one
for result in self.find(spec_or_id, *args, **kwargs).limit(-1):
File "pymongo/cursor.py", line 904, in next
if len(self.__data) or self._refresh():
File "pymongo/cursor.py", line 848, in _refresh
self.__uuid_subtype))
File "pymongo/cursor.py", line 805, in __send_message
client.disconnect()
File "pymongo/mongo_replica_set_client.py", line 1255, in disconnect
self.__schedule_refresh()
File "pymongo/mongo_replica_set_client.py", line 1067, in __schedule_refresh
self.__monitor.schedule_refresh()
File "pymongo/mongo_replica_set_client.py", line 295, in schedule_refresh
"Monitor thread is dead: Perhaps started before a fork?")
I studied code a bit and found out that Monitor.monitor() contains:
# RSC has been collected or there
# was an unexpected error.
except:
break
Which means whatever bad happened, I will never find out what was it.
So, what should I do, if I catch InvalidOperation("Monitor thread is dead: Perhaps started before a fork?")
Is there some nice way to restart Monitor instance?
(I use flask-pymongo with pymongo.version 2.6.2)
I'm not sure the exact cause of your problem, but it sounds like a bug I fixed in PyMongo 2.7, issue PYTHON-549, "recreate monitors". Please upgrade.
I'm trying to figure out what causes this error when I run my app using the basic Flask server during development. I start it with this:
from myapp import app
app.run(debug=True, port=5001)
All is well and I'll continue to code and refresh etc, but then after a while I get the recursion error and have to Ctrl-C the server and restart it. Not a big deal, just a little annoying to have to deal with every now and then.
Here's the full traceback, which I tried to use to determine the cause but can't see anything that stands out (possibly something to do with how werkzeug uses Cookie.py?):
Traceback (most recent call last):
File "/Users/jeff/.virtualenvs/fmll/lib/python2.7/site-packages/flask/app.py", line 1701, in __call__
return self.wsgi_app(environ, start_response)
File "/Users/jeff/.virtualenvs/fmll/lib/python2.7/site-packages/werkzeug/wsgi.py", line 411, in __call__
return self.app(environ, start_response)
(last bit repeated a bunch - trimmed to fit in posting size requirements)
File "/Users/jeff/.virtualenvs/fmll/lib/python2.7/site-packages/flask/app.py", line 1685, in wsgi_app
with self.request_context(environ):
File "/Users/jeff/.virtualenvs/fmll/lib/python2.7/site-packages/flask/ctx.py", line 274, in __enter__
self.push()
File "/Users/jeff/.virtualenvs/fmll/lib/python2.7/site-packages/flask/ctx.py", line 238, in push
self.session = self.app.open_session(self.request)
File "/Users/jeff/.virtualenvs/fmll/lib/python2.7/site-packages/flask/app.py", line 792, in open_session
return self.session_interface.open_session(self, request)
File "/Users/jeff/.virtualenvs/fmll/lib/python2.7/site-packages/flask/sessions.py", line 191, in open_session
secret_key=key)
File "/Users/jeff/.virtualenvs/fmll/lib/python2.7/site-packages/werkzeug/contrib/securecookie.py", line 309, in load_cookie
data = request.cookies.get(key)
File "/Users/jeff/.virtualenvs/fmll/lib/python2.7/site-packages/werkzeug/utils.py", line 77, in __get__
value = self.func(obj)
File "/Users/jeff/.virtualenvs/fmll/lib/python2.7/site-packages/werkzeug/wrappers.py", line 418, in cookies
cls=self.dict_storage_class)
File "/Users/jeff/.virtualenvs/fmll/lib/python2.7/site-packages/werkzeug/http.py", line 741, in parse_cookie
cookie.load(header)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/Cookie.py", line 632, in load
self.__ParseString(rawdata)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/Cookie.py", line 665, in __ParseString
self.__set(K, rval, cval)
File "/Users/jeff/.virtualenvs/fmll/lib/python2.7/site-packages/werkzeug/_internal.py", line 290, in _BaseCookie__set
morsel = self.get(key, _ExtendedMorsel())
File "/Users/jeff/.virtualenvs/fmll/lib/python2.7/site-packages/werkzeug/_internal.py", line 271, in __init__
Morsel.__init__(self)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/Cookie.py", line 438, in __init__
dict.__setitem__(self, K, "")
RuntimeError: maximum recursion depth exceeded while calling a Python object
Since it occurs during your developement process, you could increase recursion limit, before starting your server, using :
sys.setrecursionlimit(2000) # Choose the right figure for you here
# the value on my system is 1000 but this is platform-dependant
However, you should use it very carefully and probably not in production unless you have a good knowledge of it's impacts.
Ref : http://docs.python.org/2/library/sys.html#sys.setrecursionlimit