pymongo: What to do if monitor thread is dead? - python

For some reason ReplicaSet's Monitor appeared dead when refresh was scheduled.
I've got following Traceback in a call to find_one():
File "pymongo/collection.py", line 604, in find_one
for result in self.find(spec_or_id, *args, **kwargs).limit(-1):
File "pymongo/cursor.py", line 904, in next
if len(self.__data) or self._refresh():
File "pymongo/cursor.py", line 848, in _refresh
self.__uuid_subtype))
File "pymongo/cursor.py", line 805, in __send_message
client.disconnect()
File "pymongo/mongo_replica_set_client.py", line 1255, in disconnect
self.__schedule_refresh()
File "pymongo/mongo_replica_set_client.py", line 1067, in __schedule_refresh
self.__monitor.schedule_refresh()
File "pymongo/mongo_replica_set_client.py", line 295, in schedule_refresh
"Monitor thread is dead: Perhaps started before a fork?")
I studied code a bit and found out that Monitor.monitor() contains:
# RSC has been collected or there
# was an unexpected error.
except:
break
Which means whatever bad happened, I will never find out what was it.
So, what should I do, if I catch InvalidOperation("Monitor thread is dead: Perhaps started before a fork?")
Is there some nice way to restart Monitor instance?
(I use flask-pymongo with pymongo.version 2.6.2)

I'm not sure the exact cause of your problem, but it sounds like a bug I fixed in PyMongo 2.7, issue PYTHON-549, "recreate monitors". Please upgrade.

Related

Vague question re "Bad file descriptor" in Flask with Peewee/pscopg2

tl;dr: An app that had been working fine is suddenly throwing a "Bad file descriptor" error with no other changes; I need advice for how to evaluate this.
I inherited an app that had been untouched for years, after the server crashed and I needed to move it to another machine. It's built with Flask, and uses Peewee to talk to a Postgres database over pyscopg2. It has a bunch of other stuff--an Elasticsearch engine for searching, a lot of heavy JS on the front end--but that doesn't seem to be the problem here. The code is moderately complex, and I am not very knowledgeable about all of its pieces.
It took me a while to get it set up using the sketchy deployment instructions that had been left behind, but eventually I got it running, and was able to get a test version running on a clean VM and then deploy it on an actual server, using gunicorn and nginx. It's been working fine in production for a week. I'm using Debian Buster for all versions. I'm using the most recent versions of all software.
I then decided to do some basic code cleanup, and ran the entire app through a linter, before looking at some other changes to make, that the end user had requested. Unfortunately, after this, the app consistently fails at the same point with a "Bad file descriptor" error. This is in a pre-run section, which parses a large XML file and saves the info to the database and to Elasticsearch; the app receives an XML upload, forks a few processes, and runs the parse/index process in the background.
I am subsequently unable to get past this error by any means. I have launched a clean VM and installed everything from scratch; I've reverted the git repo to before I linted the code. Same problem. I don't see how it can be a code issue, as it's now at the same point it was when I started. But I'm at a loss for what to do, and terrified that the production machine will fail.
The errors I get (trimming the first few lines that refer to places in the app itself) are:
[2021-03-14 14:40:11.699837] self.execute()
[2021-03-14 14:40:11.699878] File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/peewee.py", line 1906, in inner
[2021-03-14 14:40:11.699907] return method(self, database, *args, **kwargs)
[2021-03-14 14:40:11.699946] File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/peewee.py", line 1977, in execute
[2021-03-14 14:40:11.699976] return self._execute(database)
[2021-03-14 14:40:11.700004] File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/peewee.py", line 2149, in _execute
[2021-03-14 14:40:11.700032] cursor = database.execute(self)
[2021-03-14 14:40:11.700060] File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/peewee.py", line 3156, in execute
[2021-03-14 14:40:11.700088] return self.execute_sql(sql, params, commit=commit)
[2021-03-14 14:40:11.700115] File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/peewee.py", line 3150, in execute_sql
[2021-03-14 14:40:11.700143] self.commit()
[2021-03-14 14:40:11.700171] File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/peewee.py", line 2916, in __exit__
[2021-03-14 14:40:11.700198] reraise(new_type, new_type(exc_value, *exc_args), traceback)
[2021-03-14 14:40:11.700226] File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/peewee.py", line 190, in reraise
[2021-03-14 14:40:11.700254] raise value.with_traceback(tb)
[2021-03-14 14:40:11.700282] File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/peewee.py", line 3143, in execute_sql
[2021-03-14 14:40:11.700309] cursor.execute(sql, params or ())
[2021-03-14 14:40:11.700339] OperationalError('SSL SYSCALL error: Bad file descriptor\n')
127.0.0.1 - - [14/Mar/2021 10:40:11] "POST /manage/versions/upload HTTP/1.1" 500 -
Error on request:
Traceback (most recent call last):
File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/werkzeug/serving.py", line 323, in run_wsgi
execute(self.server.app)
File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/werkzeug/serving.py", line 315, in execute
write(data)
File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/werkzeug/serving.py", line 273, in write
self.send_response(code, msg)
File "/home/deploy/git/myapp/venv/lib/python3.7/site-packages/werkzeug/serving.py", line 388, in send_response
self.wfile.write(hdr.encode("ascii"))
File "/usr/lib/python3.7/socketserver.py", line 799, in write
self._sock.sendall(b)
OSError: [Errno 9] Bad file descriptor
Exception in thread Thread-22:
Traceback (most recent call last):
File "/usr/lib/python3.7/threading.py", line 917, in _bootstrap_inner
self.run()
File "/usr/lib/python3.7/threading.py", line 865, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.7/socketserver.py", line 654, in process_request_thread
self.shutdown_request(request)
File "/usr/lib/python3.7/socketserver.py", line 509, in shutdown_request
self.close_request(request)
File "/usr/lib/python3.7/socketserver.py", line 513, in close_request
request.close()
File "/usr/lib/python3.7/socket.py", line 420, in close
self._real_close()
File "/usr/lib/python3.7/socket.py", line 414, in _real_close
_ss.close(self)
OSError: [Errno 9] Bad file descriptor
I note that the final section ("Exception in thread Thread-22") is showing the system Python, rather than my virtual environment; I don't know if that's relevant, or if that's just what's running some overall process. I didn't get to this point doing anything different, though--the app is running in the virtual environment.
I'd be very grateful for any thoughts here--I'm obviously hoping it's some kind of stupid permission error or something, as I can't easily go into the code because of its complexity.

Rabbit MQ python script. Socket closed when connection was open

Need some help! While running the python script using Rabbit MQ RPC. I am getting a Socket 104,Socket closed when connection was open error. Below is python traceback and some code:
Traceback (most recent call last):
File "./server.py", line 34, in <module>
channel.start_consuming()
File "/usr/lib/python2.6/site-packages/pika/adapters/blocking_connection.py", line 1681, in start_consuming
self.connection.process_data_events(time_limit=None)
File "/usr/lib/python2.6/site-packages/pika/adapters/blocking_connection.py", line 656, in process_data_events
self._dispatch_channel_events()
File "/usr/lib/python2.6/site-packages/pika/adapters/blocking_connection.py", line 469, in _dispatch_channel_events
impl_channel._get_cookie()._dispatch_events()
File "/usr/lib/python2.6/site-packages/pika/adapters/blocking_connection.py", line 1310, in _dispatch_events
evt.body)
File "./server.py", line 30, in on_request
body=json.dumps(DEVICE_INFO))
File "/usr/lib/python2.6/site-packages/pika/adapters/blocking_connection.py", line 1978, in basic_publish
mandatory, immediate)
File "/usr/lib/python2.6/site-packages/pika/adapters/blocking_connection.py", line 2065, in publish
self._flush_output()
File "/usr/lib/python2.6/site-packages/pika/adapters/blocking_connection.py", line 1174, in _flush_output
*waiters)
File "/usr/lib/python2.6/site-packages/pika/adapters/blocking_connection.py", line 395, in _flush_output
raise exceptions.ConnectionClosed()
pika.exceptions.ConnectionClosed
Apologies as i am unable to comment due to low reputation. Could you provide a little more information on how you are opening your connection. Is it really open?
It might be because of loss of connection with rabbitmq server as pika doesn't deal with disconnects and often results in similar stacktrace.
I also had similar problem, in my case it was because my pika connection was dropping after sometime and my colleague was able to deal with this by adding a wait time for mq:port_number.
We were using docker container so we added following line to our invoke.sh to wait for mq:
filename.py --wait-secs 30 --port-wait mq:5672
I hope you are able to resolve this after doing that.
Otherwise it would be better to check if the connection is being dropped by pika before your python script runs or providing more information on how you are invoking it.

when running tests via nose paramiko gets stuck, how does nose handle globals?

All,
We have this issue often when running nose. Whenever nose encounters global variables or imported modules import modules which import modules....(no importloop) it gets stuck in some way or shows strange behaviour. I believe the error we're encountering at the moment is similar to this strange behaviour.
running the test with unittest.main() gives no problem but when I execute the testcase with nose.run() I get a weird timeout from paramiko.
File "D:\test_for_nose.py", line 37, in __init__
conf = local_config.ConfigLoader("jefne_system_conf")
File "build\bdist.win32\egg\testframework\configurator\configurator.py", line 184, in __init__
File "build\bdist.win32\egg\testframework\configurator\configurator.py", line 197, in _create_dependencies
File "build\bdist.win32\egg\testframework\Engines\sshengine.py", line 137, in make_ssh_engine
File "build\bdist.win32\egg\testframework\Engines\sshengine.py", line 114, in create_ssh_session_obj_from_hostname
File "C:\Python27\lib\site-packages\paramiko-1.11.0-py2.7.egg\paramiko\client.py", line 342, in connect
self._auth(username, password, pkey, key_filenames, allow_agent, look_for_keys)
File "C:\Python27\lib\site-packages\paramiko-1.11.0-py2.7.egg\paramiko\client.py", line 524, in _auth
self._transport.auth_password(username, password)
File "C:\Python27\lib\site-packages\paramiko-1.11.0-py2.7.egg\paramiko\transport.py", line 1183, in auth_password
return self.auth_handler.wait_for_response(my_event)
File "C:\Python27\lib\site-packages\paramiko-1.11.0-py2.7.egg\paramiko\auth_handler.py", line 158, in wait_for_response
event.wait(0.1)
File "C:\Python27\lib\threading.py", line 618, in wait
self.__cond.wait(timeout)
File "C:\Python27\lib\threading.py", line 358, in wait
_sleep(delay)
KeyboardInterrupt
I'm totally lost here, but quite sure it has to do with imports and globals is there a known problem with nose, threading, globals, multiple imports or something resembling all this?
It looks like it is waiting for some kind of authentication response (username/password?) from you in command prompt. Try running nosetests with -s to see what it wants. Nose by default hijacks stdout so you would not even see it. Looks like your test establishes ssh connection, and it fails through to username/password login. Fix it by adding your external machine to the authorized_keys or some other approved method to avoid interactive part.

Blender Network Render Timeout

I've tried to render a animation using Network Render. I connected my PC an my Laptop without further problems. But when I clicked "Render animation on network" after some seconds the following error occurs:
AL lib: (EE) UpdateDeviceParams: Failed to set 44100hz, got 48000hz instead
Traceback (most recent call last):
File "F:\Program Files (x86)\Blender\2.74\scripts\addons\netrender\operat ors.py", line 85, in invoke
return self.execute(context)
File "F:\Program Files (x86)\Blender\2.74\scripts\addons\netrender\operat ors.py", line 77, in execute
scene.network_render.job_id = client.sendJob(conn, scene, True)
File "F:\Program Files (x86)\Blender\2.74\scripts\addons\netrender\client .py", line 121, in sendJob
return sendJobBlender(conn, scene, anim, can_save)
File "F:\Program Files (x86)\Blender\2.74\scripts\addons\netrender\client .py", line 340, in sendJobBlender
response = conn.getresponse()
File "F:\Program Files (x86)\Blender\2.74\python\lib\http\client.py", line 1172, in getresponse
response.begin()
File "F:\Program Files (x86)\Blender\2.74\python\lib\http\client.py", line 351, in begin
version, status, reason = self._read_status()
File "F:\Program Files (x86)\Blender\2.74\python\lib\http\client.py", line 313, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "F:\Program Files (x86)\Blender\2.74\python\lib\socket.py", line 371, inreadinto
return self._sock.recv_into(b)
socket.timeout: timed out
I asked Google: Somebody circumvents the problem by changing "the default timeout to 1000 (instead of 300) (in the socket.py file[...])". I can't find this line, I guess they changed it in the current version. Since I have no experience using python I do not know how I can change it now.
I hope you can help me!
The addon would be a better place to make the change instead of the socket module. If you look in your addons folder you will find netrender/utils.py where you will find a few lines that use socket.setdefaulttimeout and you could make some adjustments there.
An even better solution would be to look at why the connection is timing out, two computers in the same room should not get any timeouts. A common cause of timeouts is the inability to get a connection, firewalls are good at stopping connections, so you may want to check that the port used by network render is allowing incoming connections, and that blender is running with network render turned on to accept the connection. The default port is 8000 which could also be in use by another application, you can configure each computer to use a different port if needed.

DeadlineExceededError only when starting new instance (using webapp2)

I'm getting some weird behavior -- when the application starts up a new instance for the first time, I get a DeadlineExceededError. When I hit refresh in the browser it works just fine And it doesn't matter which page I try. The strange thing is I can see all my debugging code just fine. In fact, I write to the log just prior to calling self.response and it shows up in the console's log. This is pretty hard to troubleshoot, since I'm not having any page load problems in the development environment, and the traceback is a bit opaque to me:
E 2013-09-29 00:10:03.975
Traceback (most recent call last):
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/runtime/wsgi.py", line 267, in Handle
for chunk in result:
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/appstats/recording.py", line 1286, in appstats_wsgi_wrapper
end_recording(status)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/appstats/recording.py", line 1410, in end_recording
rec.save()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/appstats/recording.py", line 654, in save
key, len_part, len_full = self._save()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/appstats/recording.py", line 678, in _save
namespace=config.KEY_NAMESPACE)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/memcache/__init__.py", line 1008, in set_multi
namespace=namespace)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/memcache/__init__.py", line 907, in _set_multi_with_policy
status_dict = rpc.get_result()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 612, in get_result
return self.__get_result_hook(self)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/memcache/__init__.py", line 974, in __set_with_policy_hook
rpc.check_success()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 578, in check_success
self.__rpc.CheckSuccess()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_rpc.py", line 133, in CheckSuccess
raise self.exception
DeadlineExceededError
I 2013-09-29 00:10:03.988
This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.
I'm not sure how to even go about debugging this, since the error seems to be after all my code has already run.
Edit: I should add this:
I 2013-09-29 00:09:06.919
DEBUG: Writing output!
E 2013-09-29 00:10:03.975
You can see there's nearly a full minute between logging "Writing output!" just before self.response is called, and when the error occurs.
Deadlineexceedederror happens in app engine, if any request to a frontend instance does not get a response within 60 seconds. So what is happening in your case must be that when there are no running instance and your app receives a new user request, a new instance is started for processing. This will lead to an overall response time = instance startup time like library loading and initial data access + the time for processing the user request and this causes a deadlineexceeded error. Then when you are accessing your app immediately , there is an already running instance and so response time = the time for processing the user request and you do not get any error.
Please check the suggested approaches for handling deadlineexceedederror including warmup requests, which is like keeping an instance ready before arrival of a live user request.

Categories