Luigi framework crash - python

when I running luigi tasks, sometimes will meet framework crash, cause the following tasks all failed. Here the error log info:
2017-10-05 22:02:02,564 luigi-interface WARNING Failed pinging scheduler
2017-10-05 22:02:03,129 requests.packages.urllib3.connectionpool INFO Starting new HTTP connection (126): localhost
2017-10-05 22:02:03,130 luigi-interface ERROR Failed connecting to remote scheduler 'http://localhost:8082'
Traceback (most recent call last):
...
File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 585, in send
r = adapter.send(request, **kwargs)
File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/requests/adapters.py", line 467, in send
raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='localhost', port=8082): Max retries exceeded with url: /api/add_worker (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f15128cb3d0>: Failed to establish a new connection: [Errno 111] Connection refused',))
2017-10-05 22:02:03,180 luigi-interface INFO Worker Worker(salt=150908931, workers=3, host=etl2, username=develop, pid=18019) was stopped. Shutting down Keep-Alive thread
Traceback (most recent call last):
File "app_metadata.py", line 1567, in <module>
luigi.run()
File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/luigi/interface.py", line 210, in run
return _run(*args, **kwargs)['success']
File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/luigi/interface.py", line 238, in _run
return _schedule_and_run([cp.get_task_obj()], worker_scheduler_factory)
File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/luigi/interface.py", line 197, in _schedule_and_run
success &= worker.run()
File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/luigi/worker.py", line 867, in run
self._add_worker()
File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/luigi/worker.py", line 652, in _add_worker
self._scheduler.add_worker(self._id, self._worker_info)
File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/luigi/rpc.py", line 219, in add_worker
return self._request('/api/add_worker', {'worker': worker, 'info': info})
File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/luigi/rpc.py", line 146, in _request
page = self._fetch(url, body, log_exceptions, attempts)
File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/luigi/rpc.py", line 138, in _fetch
last_exception
luigi.rpc.RPCError: Errors (3 attempts) when connecting to remote scheduler 'http://localhost:8082'
sounds like try to ping central schedule, but be failed, then crashed, later tasks all be blocked, cannot run successfully.
and, some one else also meet the similar error, but his resolution not works.
Github - Failed connecting to remote scheduler #1894

I would try making the timeout a little longer if your central scheduler is getting overloaded. You could also increase retries and retry wait time.
in luigi.cfg
[core]
rpc-connect-timeout=60.0 #default is 10.0
rpc-retry-attempts=10 #default is 3
rpc-retry-wait=60 #default is 30
You may also want to add a watch have the scheduler process automatically restart on crash.

Have you configured the central scheduler properly? See the docs:
https://luigi.readthedocs.io/en/stable/central_scheduler.html
If not, try using the local scheduler by specifying --local-scheduler from the command line.

Related

Stanford CoreNLP server using 12 threads, 6 clients calling it and yet sometime “Only one usage of each socket address is normally permitted” error

Stanford CoreNLP server is using 12 threads, I have 6 mono-threaded clients calling it and yet I sometime get the error message:
Failed to establish a new connection: [WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted'))
How is it possible? I thought it would be safe to run up to 12 clients simultaneously since the Stanford CoreNLP server is using 12 threads and my clients only use 1 thread each.
I launch the Stanford CoreNLP server using:
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9001 -timeout 50000
which starts the Stanford CoreNLP server with 12 threads as I have 12 CPU cores (and I can see that Stanford CoreNLP server mentions it will use 12 threads).
The code I used to call the Stanford CoreNLP server is:
import os
import json
from pycorenlp import StanfordCoreNLP
import time
import sys
nlp = StanfordCoreNLP('http://localhost:9001')
total_start_time = time.time()
for i in range(9999999):
text = 'without the dataset the paper {0} is useless'.format(i)
print('text: {0}'.format(text))
start_time = time.time()
output = nlp.annotate(text, properties={
'annotators': 'ner',
'outputFormat': 'json'
})
elapsed_time = time.time() - start_time
print('elapsed_time: {0:.4f} seconds'.format(elapsed_time))
print('total_start_time: {0:.4f} seconds'.format(time.time()-total_start_time))
The entire error stack is below. The script is called parse_captions.py and it has a few more lines of code that the code I gave above, so the line numbers don't match. Also, the text being parsed is different.
text: anniversary of the liquidation of the litzmanstadt ghetto in lodz
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connection.py", line 159, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\util\connection.py", line 80, in create_connection
raise err
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\util\connection.py", line 70, in create_connection
sock.connect(sa)
OSError: [WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
chunked=chunked)
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connectionpool.py", line 354, in _make_request
conn.request(method, url, **httplib_request_kw)
File "C:\ProgramData\Anaconda3\lib\http\client.py", line 1229, in request
self._send_request(method, url, body, headers, encode_chunked)
File "C:\ProgramData\Anaconda3\lib\http\client.py", line 1275, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "C:\ProgramData\Anaconda3\lib\http\client.py", line 1224, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "C:\ProgramData\Anaconda3\lib\http\client.py", line 1016, in _send_output
self.send(msg)
File "C:\ProgramData\Anaconda3\lib\http\client.py", line 956, in send
self.connect()
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connection.py", line 181, in connect
conn = self._new_conn()
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connection.py", line 168, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x000002660423E780>: Failed to establish a new connection: [WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\requests\adapters.py", line 449, in send
timeout=timeout
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\util\retry.py", line 398, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=9001): Max retries exceeded with url: /?properties=%7B%27annotators%27%3A+%27tokenize%2Cssplit%2Cpos%2Cdepparse%2Ctruecase%2Cparse%27%2C+%27outputFormat%27%3A+%27json%27%7D (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000002660423E780>: Failed to establish a new connection: [WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "parse_captions.py", line 52, in <module>
main()
File "parse_captions.py", line 37, in main
'outputFormat': 'json'
File "C:\ProgramData\Anaconda3\lib\site-packages\pycorenlp\corenlp.py", line 29, in annotate
}, data=data, headers={'Connection': 'close'})
File "C:\ProgramData\Anaconda3\lib\site-packages\requests\api.py", line 116, in post
return request('post', url, data=data, json=json, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\requests\api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\requests\sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\requests\sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\requests\adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=9001): Max retries exceeded with url: /?properties=%7B%27annotators%27%3A+%27tokenize%2Cssplit%2Cpos%2Cdepparse%2Ctruecase%2Cparse%27%2C+%27outputFormat%27%3A+%27json%27%7D (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000002660423E780>: Failed to establish a new connection: [WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted'))
I use Python 3.7.3 x64, Stanford CoreNLP version 3.9.2 (i.e., 2018-10-05) and Microsoft Windows 18.09 "Professional".
My first guess is that this may be a limitation of the way pycorenlp handles networking relative to how networking works on Windows.
https://help.socketlabs.com/docs/how-to-fix-error-only-one-usage-of-each-socket-address-protocolnetwork-addressport-is-normally-permitted
I don't know how pycorenlp operates, as it is not officially supported by Stanford NLP.
Perhaps it is opening a new connection for every query it sends. That would quickly use up the available connections, especially if they hang around for 4 minutes once they are finished.
A similar thing happens if I use Stanford NLP's software on Windows 10, stanza, after about 14000 iterations:
from stanza.server.client import CoreNLPClient
with CoreNLPClient(annotators="tokenize", be_quiet=True, classpath='$CLASSPATH') as nlp:
print("Hello world")
for i in range(1000000):
if i % 1000 == 0:
print(i)
nlp.annotate("Unban mox opal")
11000
12000
13000
14000
Traceback (most recent call last):
File "C:\Program Files\Python38\lib\site-packages\urllib3\connection.py", line 159, in _new_conn
conn = connection.create_connection(
File "C:\Program Files\Python38\lib\site-packages\urllib3\util\connection.py", line 84, in create_connection
raise err
File "C:\Program Files\Python38\lib\site-packages\urllib3\util\connection.py", line 74, in create_connection
sock.connect(sa)
OSError: [WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted
It seems like the discount solution for this problem would be to make one of the suggested registry changes to reduce the lifespan of a closed connection. The workaround which involves more work on your part would be to send more text per request. The fancy solution on our end would be to find some way to reuse the existing connections instead of opening & closing them repeatedly. In stanza, for example, we use requests.post for each query. Presumably opening a connection and keeping it over the course of multiple queries is possible and more efficient. However, I can tell you that won't happen any time soon for stanza, and no one here has any experience working with pycorenlp.
In general you will get a faster response by posting issues on github.com/stanfordnlp/CoreNLP or github.com/stanfordnlp/stanza

Catching Firebase 504 gateway timeout

I'm building a simple IOT device (with a Raspberry Pi Zero) which pulls data from Firebase Realtime Database every 1 second and checks for updates.
However, after a certain time (not sure exactly how much but somewhere between 1 hour and 3 hours) the program exits with a 504 Server Error: Gateway Time-out message.
I couldn't understand exactly why this is happening, I tried to recreate this error by disconnecting the Pi from the internet and I did not get this message. Instead, the program simply paused in a ref.get() line and automatically resumed running once the connection was back.
This device is meant to be always on, so ideally if I get some kind of error, I would like to restart the program / reinitiate the connection / reboot the Pi. Is there a way to achieve something like this?
It seems like the message is actually generated by the firebase_admin package.
Here is the error message:
Traceback (most recent call last):
File "/home/pi/.local/lib/python3.7/site-packages/firebase_admin/db.py", line 944, in request
return super(_Client, self).request(method, url, **kwargs)
File "/home/pi/.local/lib/python3.7/site-packages/firebase_admin/_http_client.py", line 105, in request
resp.raise_for_status()
File "/usr/lib/python3/dist-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 504 Server Error: Gateway Time-out for url: https://someFirebaseProject.firebaseio.com/someRef/subSomeRef/payload.json
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/pi/Desktop/project/main.py", line 94, in <module>
lastUpdate = ref.get()['lastUpdate']
File "/home/pi/.local/lib/python3.7/site-packages/firebase_admin/db.py", line 223, in get
return self._client.body('get', self._add_suffix(), params=params)
File "/home/pi/.local/lib/python3.7/site-packages/firebase_admin/_http_client.py", line 117, in body
resp = self.request(method, url, **kwargs)
File "/home/pi/.local/lib/python3.7/site-packages/firebase_admin/db.py", line 946, in request
raise _Client.handle_rtdb_error(error)
firebase_admin.exceptions.UnknownError: Internal server error.
>>>
To reboot the whole Raspberry Pi, you can just run a shell command:
import os
os.system("sudo reboot")
I've had this problem too and usually feel safer with that, but there's obvious downsides. I'd try resetting the wifi connection or network interface in a similar way

Connection to remote MySQL db from Python 3.4

I'm trying to connect to two MySQL databases (one local, one remote) at the same time using Python 3.4 but I'm really struggling. Splitting the problem into three:
Step 1: connect to the local DB. This is working fine
using PyMySQL. (MySQLdb isn't compatible with Python 3.4, of
course.)
Step 2: connect to the remote DB (which needs to
use SSH). I can get it to work from the Linux command prompt but not
from Python... see below.
Step 3: connect to both at the
same time. I think I'm supposed to use a different port for the
remote database so that I can have both connections at the same time
but I'm out of my depth here! If it's relevant then the two DBs will
have different names. And if this question isn't directly related,
please tell me and I'll post it separately.
Unfortunately I'm not really starting in the right place for a newbie... once I can get this working I can happily go back to basic Python and SQL but hopefully someone will take pity on me and give me a hand to get started!
For Step 2, my code is below. It seems to be quite close to the sshtunnel example which answers this question Python - SSH Tunnel Setup and MySQL DB Access - though that uses MySQLdb. For the moment I'm embedding the connection parameters – I'll move them to the config file once it's working properly.
import dropbox, pymysql, shlex, shutil, subprocess
from sshtunnel import SSHTunnelForwarder
import iot_config as cfg
def CloseLocalDB():
localcur.close()
localdb.close()
def CloseRemoteDB():
# Disconnect from the database
# remotecur.close()
# remotedb.close()
# Close the SSH tunnel
# ssh.close()
print("end of CloseRemoteDB function")
def OpenLocalDB():
global localcur, localdb
localdb = pymysql.connect(host=cfg.localdbconn['host'], user=cfg.localdbconn['user'], passwd=cfg.localdbconn['passwd'], db=cfg.localdbconn['db'])
localcur = localdb.cursor()
def OpenRemoteDB():
global remotecur, remotedb
with SSHTunnelForwarder(
('my_remote_site', 22),
ssh_username = "my_ssh_username",
ssh_private_key = "/etc/ssh/my_private_key.ppk",
ssh_private_key_password = "my_private_key_password",
remote_bind_address = ('127.0.0.1', 3308)) as server:
remotedb = None
#Following line gives an error if uncommented
# remotedb = pymysql.connect(host='127.0.0.1', user='remote_db_user', passwd='remote_db_password', db='remote_db_name', port=server.local_bind_port)
#remotecur = remotedb.cursor()
# Main program starts here
OpenLocalDB()
CloseLocalDB()
OpenRemoteDB()
CloseRemoteDB()
This is the error I'm getting:
2016-04-21 19:13:33,487 | ERROR | Secsh channel 0 open FAILED: Connection refused: Connect failed
2016-04-21 19:13:33,553 | ERROR | In #1 <-- ('127.0.0.1', 60591) to ('127.0.0.1', 3308) failed: ChannelException(2, 'Connect failed')
----------------------------------------
Exception happened during processing of request from ('127.0.0.1', 60591)
Traceback (most recent call last):
File "/usr/local/lib/python3.4/dist-packages/sshtunnel.py", line 286, in handle
src_address)
File "/usr/local/lib/python3.4/dist-packages/paramiko/transport.py", line 834, in open_channel
raise e
paramiko.ssh_exception.ChannelException: (2, 'Connect failed')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.4/socketserver.py", line 613, in process_request_thread
self.finish_request(request, client_address)
File "/usr/lib/python3.4/socketserver.py", line 344, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/lib/python3.4/socketserver.py", line 669, in __init__
self.handle()
File "/usr/local/lib/python3.4/dist-packages/sshtunnel.py", line 296, in handle
raise HandlerSSHTunnelForwarderError(msg)
sshtunnel.HandlerSSHTunnelForwarderError: In #1 <-- ('127.0.0.1', 60591) to ('127.0.0.1', 3308) failed: ChannelException(2, 'Connect failed')
----------------------------------------
Traceback (most recent call last):
File "/home/pi/Documents/iot_pm2/iot_ssh_example_for_help.py", line 38, in <module>
OpenRemoteDB()
File "/home/pi/Documents/iot_pm2/iot_ssh_example_for_help.py", line 32, in OpenRemoteDB
remotedb = pymysql.connect(host='127.0.0.1', user='remote_db_user', passwd='remote_db_password', db='remote_db_name', port=server.local_bind_port)
File "/usr/local/lib/python3.4/dist-packages/pymysql/__init__.py", line 88, in Connect
return Connection(*args, **kwargs)
File "/usr/local/lib/python3.4/dist-packages/pymysql/connections.py", line 678, in __init__
self.connect()
File "/usr/local/lib/python3.4/dist-packages/pymysql/connections.py", line 889, in connect
self._get_server_information()
File "/usr/local/lib/python3.4/dist-packages/pymysql/connections.py", line 1190, in _get_server_information
packet = self._read_packet()
File "/usr/local/lib/python3.4/dist-packages/pymysql/connections.py", line 945, in _read_packet
packet_header = self._read_bytes(4)
File "/usr/local/lib/python3.4/dist-packages/pymysql/connections.py", line 981, in _read_bytes
2013, "Lost connection to MySQL server during query")
pymysql.err.OperationalError: (2013, 'Lost connection to MySQL server during query')
Thanks in advance.
Answering my own question because, with a lot of help from J.M. Fernández on Github, I have a solution: the example that I copied at the beginning uses port 3308 but port 3306 is the standard. Once I'd changed this it started working.

Handling Non-SSL Traffic in Python/Tornado

I have a webservice running in python 2.7.10 / Tornado that uses SSL. This service throws an error when a non-SSL call comes through (http://...).
I don't want my service to be accessible when SSL is not used, but I'd like to handle it in a cleaner fashion.
Here is my main code that works great over SSL:
if __name__ == "__main__":
tornado.options.parse_command_line()
#does not work on 2.7.6
ssl_ctx = ssl.create_default_context(ssl.Purpose.CLIENT_AUTH)
ssl_ctx.load_cert_chain("...crt.pem","...key.pem")
ssl_ctx.load_verify_locations("...CA.crt.pem")
http_server = tornado.httpserver.HTTPServer(application, ssl_options=ssl_ctx, decompress_request=True)
http_server.listen(options.port)
mainloop = tornado.ioloop.IOLoop.instance()
print("Main Server started on port XXXX")
mainloop.start()
and here is the error when I hit that server with http://... instead of https://...:
[E 151027 20:45:57 http1connection:700] Uncaught exception
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/tornado/http1connection.py", line 691, in _server_request_loop
ret = yield conn.read_response(request_delegate)
File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 807, in run
value = future.result()
File "/usr/local/lib/python2.7/dist-packages/tornado/concurrent.py", line 209, in result
raise_exc_info(self._exc_info)
File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 810, in run
yielded = self.gen.throw(*sys.exc_info())
File "/usr/local/lib/python2.7/dist-packages/tornado/http1connection.py", line 166, in _read_message
quiet_exceptions=iostream.StreamClosedError)
File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 807, in run
value = future.result()
File "/usr/local/lib/python2.7/dist-packages/tornado/concurrent.py", line 209, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
SSLError: [SSL: HTTP_REQUEST] http request (_ssl.c:590)
Any ideas how I should handle that exception?
And what the standard-conform return value would be when I catch a non-SSL call to an SSL-only API?
UPDATE
This API runs on a specific port e.g. https://example.com:1234/. I want to inform a user who is trying to connect without SSL, e.g. http://example.com:1234/ that what they are doing is incorrect by returning an error message or status code. As it is the uncaught exception returns a 500, which they could interpret as a programming error on my part. Any ideas?
There's an excelent discussion in this Tornado issue about that, where Tornado maintainer says:
If you have both HTTP and HTTPS in the same tornado process, you must be running two separate HTTPServers (of course such a feature should not be tied to whether SSL is handled at the tornado level, since you could be terminating SSL in a proxy, but since your question stipulated that SSL was enabled in tornado let's focus on this case first). You could simply give the HTTP server a different Application, one that just does this redirect.
So, the best solution it's to HTTPServer that listens on port 80 and doesn't has the ssl_options parameter setted.
UPDATE
A request to https://example.com/some/path will go to port 443, where you must have an HTTPServer configured to handle https traffic; while a request to http://example.com/some/path will go to port 80, where you must have another instance of HTTPServer without ssl options, and this is where you must return the custom response code you want. That shouldn't raise any error.

From Raspberry PI using PIKA to put data to RabbitMQ

i have an issue.
For my final year project of uni i have my raspberry Pi taking sensor data, i need to put that data into my rabbitMQ service on another machine.
Currently i am using pika as a library to push the data into the queue.
Right now i have an issue with connection, this is the error i am getting.
WARNING:pika.adapters.base_connection:Could not connect due to "Connection timed out," retrying in 10 sec
ERROR:pika.adapters.base_connection:Could not connect: Connection timed out
Traceback (most recent call last):
File "TryOne.py", line 8, in <module>
connection = pika.AsyncoreConnection(pika.ConnectionParameters('192.168.1.93'))
File "/usr/local/lib/python2.7/dist-packages/pika/adapters/base_connection.py", line 63, in __init__
super(BaseConnection, self).__init__(parameters, on_open_callback)
File "/usr/local/lib/python2.7/dist-packages/pika/connection.py", line 513, in __init__
self._connect()
File "/usr/local/lib/python2.7/dist-packages/pika/connection.py", line 804, in _connect
self._adapter_connect()
File "/usr/local/lib/python2.7/dist-packages/pika/adapters/asyncore_connection.py", line 96, in _adapter_connect
super(AsyncoreConnection, self)._adapter_connect()
File "/usr/local/lib/python2.7/dist-packages/pika/adapters/base_connection.py", line 122, in _adapter_connect
self.params.retry_delay)
pika.exceptions.AMQPConnectionError: 10.0
Any help with this problem is very much appreciated.

Categories