I a running a process on Apache Airflow that has a loop in which it reads data from a MSSQL data base, adds two columns and writes the data to another MSSQL data base. I am using MsSqlHook to connect to both bases
The process usually runs fine with a loop that reads and loads the data, but sometimes, after some successful data writes, I get the following error message:
ERROR - (20009, b'DB-Lib error message 20009, severity 9:\nUnable to connect: Adaptive Server is unavailable or does not exist (SOURCE_DB.database.windows.net:PORT)\nNet-Lib error during Connection timed out (110)\nDB-Lib error message 20009, severity 9:\nUnable to connect: Adaptive Server is unavailable or does not exist (SOURCE_DB.database.windows.net:PORT)\nNet-Lib error during Connection timed out (110)\n')
Traceback (most recent call last):
File "src/pymssql.pyx", line 636, in pymssql.connect
File "src/_mssql.pyx", line 1957, in _mssql.connect
File "src/_mssql.pyx", line 676, in _mssql.MSSQLConnection.__init__
File "src/_mssql.pyx", line 1683, in _mssql.maybe_raise_MSSQLDatabaseException
_mssql.MSSQLDatabaseException: (20009, b'DB-Lib error message 20009, severity 9:\nUnable to connect: Adaptive Server is unavailable or does not exist (SOURCE_DB.database.windows.net:PORT)\nNet-Lib error during Connection timed out (110)\nDB-Lib error message 20009, severity 9:\nUnable to connect: Adaptive Server is unavailable or does not exist (SOURCE_DB.database.windows.net:PORT)\nNet-Lib error during Connection timed out (110)\n')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 984, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.7/site-packages/airflow/operators/python_operator.py", line 113, in execute
return_value = self.execute_callable()
File "/usr/local/lib/python3.7/site-packages/airflow/operators/python_operator.py", line 118, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File "/usr/local/airflow/dags/DAG_NAME.py", line 156, in readWriteData
df = readFromSource(query)
File "/usr/local/airflow/dags/MX_CENT_SAMS_EXIT_APP_ITMS_MIGRATION.py", line 112, in readFromSource
df = mssql_hook.get_pandas_df(sql=query)
File "/usr/local/lib/python3.7/site-packages/airflow/hooks/dbapi_hook.py", line 99, in get_pandas_df
with closing(self.get_conn()) as conn:
File "/usr/local/lib/python3.7/site-packages/airflow/hooks/mssql_hook.py", line 48, in get_conn
port=conn.port)
File "src/pymssql.pyx", line 642, in pymssql.connect
I am guessing this is because the connection to the source data base is unstable, and whenever it is interrupted it can't reestablish it, so is there a way to pause or make the process wait if the source connection becomes unavaliable?
This is my current code:
def readFromSource(query):
"""
Args: query--> Query to be executed
Returns: Dataframe with source tables data
"""
print("Executing readFromSource()")
mssql_hook = MsSqlHook(mssql_conn_id=SRC_CONN)
mssql_hook.autocommit = True
df = mssql_hook.get_pandas_df(sql=query)
print(f"Source rows: {df.shape[0]}")
print("readFromSource() execution completed")
return df
def writeToTarget(df):
print("Executing writeToTarget()")
try:
fast_sql_conn = FastMSSQLConnection(TGT_CONN)
tgt_conn = fast_sql_conn.getConnection()
with closing(tgt_conn) as conn:
df.to_sql(
name=TGT_TABLE,
schema='dbo',
con=conn,
chunksize=CHUNK_SIZE,
method='multi',
index=False,
if_exists='append'
)
except Exception as e:
print("Error while loading data to target: " + str(e))
print("writeToTarget() execution completed")
def readWriteData(*op_args, **context):
"""Loads info to target table
"""
print("Executing readWriteData()")
partition_column_list = context['ti'].xcom_pull(
task_ids='getPartitionColumnList')
parallelProcParams = context['ti'].xcom_pull(
task_ids='setParallelProcessingParams')
range_start = parallelProcParams['i'][op_args[0]][0]
range_len = parallelProcParams['i'][op_args[0]][1]
for i in range(range_start, range_start + range_len):
filter_ = partition_column_list[i]
print(f"Executing for audititemid: {filter_}")
query = SRC_QUERY + ' and audititemid = ' + str(filter_).replace("[","").replace("]","") # a exit app
df = readFromSource(query)
df = df.rename(columns={"createdate": "CREAT_DATE", "scannedqty": "SCANNED_QTY", "audititemid":"AUDT_ITM_ID", "auditid":"AUDT_ID", "upc":"UPC", "itemnbr":"ITM_NBR", "txqty":"TXNS_QTY", "displayname":"DSPLY_NAME", "unitprice":"UNIT_PRICE", "cancelled":"CNCL"})
df['LOADG_CHNNL'] = 'Airflow Exit App DB'
df['LOADG_DATE'] = datetime.now()
writeToTarget(df)
print("readWriteData() execution completed")
You could split the task in two:
Read from DB and persist
Read persisted data and write to DB
The first task will read the data, transform it, and persist it (e.g., on the local disk). The second one will read the persisted data and write it to DB using a transaction. For the second task set the number of retries as needed.
Now, if the connection times out the second task will fail, the changes to DB will be rolled back, and Airflow will retry the task as many times as you set.
Related
I m trying to connect to a mysql database. if server is not responding, my application crashes. I am using try, except but looks like two exceptions are raising and "try: except" could not handle it. can some one figure it out where is the problem. below is my code:-
def check_server(server_address):
con = mysql.connector.connect(host='{}'.format(server_address),
database='domicile_reports',
user='xyz',
password='xyz')
try:
if con.is_connected():
print('{} Connected'.format(server_address))
con.close()
except Exception as e:
print("Can not connect to db. {} Occured".format(e))
check_server('25.13.253.67')
Error displayed on terminal:-
Traceback (most recent call last):
File "C:\Users\Hamid Shah\AppData\Roaming\Python\Python310\site-packages\mysql\connector
network.py", line 574, in open_connection
self.sock.connect(sockaddr)
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "f:\Docs\OneDrive\Python Projects\CFC App\splash_screen_gui.py", line 151, in
obj = splashscreen()
File "f:\Docs\OneDrive\Python Projects\CFC App\splash_screen_gui.py", line 51, in init
self.check_server(self.server_2)
File "f:\Docs\OneDrive\Python Projects\CFC App\splash_screen_gui.py", line 80, in check_server
con = mysql.connector.connect(host='{}'.format(server_address),
File "C:\Users\Hamid Shah\AppData\Roaming\Python\Python310\site-packages\mysql\connector_init_.py", line 273, in connect
return MySQLConnection(*args, **kwargs)
File "C:\Users\Hamid Shah\AppData\Roaming\Python\Python310\site-packages\mysql\connector\connection.py", line 116, in init
self.connect(**kwargs)
File "C:\Users\Hamid Shah\AppData\Roaming\Python\Python310\site-packages\mysql\connector\abstracts.py", line 1052, in connect
self._open_connection()
File "C:\Users\Hamid Shah\AppData\Roaming\Python\Python310\site-packages\mysql\connector\connection.py", line 494, in _open_connection
self._socket.open_connection()
File "C:\Users\Hamid Shah\AppData\Roaming\Python\Python310\site-packages\mysql\connector\network.py", line 576, in open_connection
raise errors.InterfaceError(
mysql.connector.errors.InterfaceError: 2003: Can't connect to MySQL server on '25.13.253.67:3306' (10060 A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to
respond)
You need to put the connect() call inside the try: except block. You can also use a context manager to properly close the connection, example:
import mysql.connector
from mysql.connector.errors import Error
def check_server(server_address):
try:
with mysql.connector.connect(
host=server_address,
database="domicile_reports",
user="xyz",
password="xyz",
) as cnx:
print(f"{server_address} Connected")
except Error as err:
print(f"Can not connect to db. {err} Occured")
check_server("25.13.253.67")
I'm getting this SIGTerm error on Airflow 1.10.11 using LocalExecutor.
[2020-09-21 10:26:51,210] {{taskinstance.py:955}} ERROR - Received SIGTERM. Terminating subprocesses.
The dag task is doing this:
reading some data from SQL Server (on Windows) to a pandas dataframe.
And then it writes it to a file (it doesn't even get to this part).
The strange thing is if I limit the number of rows to return in the query (say TOP 100), the dag succeeds.
If I run the python code in my machine locally, it succeeds. I'm using pyodbc and sqlalchemy. It fails on this line after only 20 or 30 seconds:
df_query_results = pd.read_sql(sql_query, engine)
Airflow log
[2020-09-21 10:26:51,210] {{helpers.py:325}} INFO - Sending Signals.SIGTERM to GPID xxx
[2020-09-21 10:26:51,210] {{taskinstance.py:955}} ERROR - Received SIGTERM. Terminating subprocesses.
[2020-09-21 10:26:51,804] {{taskinstance.py:1150}} ERROR - Task received SIGTERM signal
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 984, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/airflow/dags/operators/sql_to_avro.py", line 39, in execute
df_query_results = pd.read_sql(sql_query, engine)
File "/usr/local/lib64/python3.6/site-packages/pandas/io/sql.py", line 436, in read_sql
chunksize=chunksize,
File "/usr/local/lib64/python3.6/site-packages/pandas/io/sql.py", line 1231, in read_query
data = result.fetchall()
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/result.py", line 1216, in fetchall
e, None, None, self.cursor, self.context
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1478, in _handle_dbapi_exception
util.reraise(*exc_info)
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 153, in reraise
raise value
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/result.py", line 1211, in fetchall
l = self.process_rows(self._fetchall_impl())
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/result.py", line 1161, in _fetchall_impl
return self.cursor.fetchall()
File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 957, in signal_handler
raise AirflowException("Task received SIGTERM signal")
airflow.exceptions.AirflowException: Task received SIGTERM signal
[2020-09-21 10:26:51,813] {{taskinstance.py:1194}} INFO - Marking task as FAILED.
EDIT:
I missed this earlier, but there is a warning message about the hostname.
WARNING - The recorded hostname da2mgrl001d1.mycompany.corp does not match this instance's hostname airflow-mycompany-dev.i.mct360.com
I had a Linux/network engineer help out. Unfortunately, I don't know the full details but the fix was they changed the hostname_callable setting in airflow.cfg to hostname_callable = socket:gethostname. It was previously set to socket:getfqdn
Note: I found a couple different (maybe related?) questions where this was the resolution.
How to fix the error "AirflowException("Hostname of job runner does not match")"?
https://stackoverflow.com/a/59108743/220997
I'm trying to connect to two MySQL databases (one local, one remote) at the same time using Python 3.4 but I'm really struggling. Splitting the problem into three:
Step 1: connect to the local DB. This is working fine
using PyMySQL. (MySQLdb isn't compatible with Python 3.4, of
course.)
Step 2: connect to the remote DB (which needs to
use SSH). I can get it to work from the Linux command prompt but not
from Python... see below.
Step 3: connect to both at the
same time. I think I'm supposed to use a different port for the
remote database so that I can have both connections at the same time
but I'm out of my depth here! If it's relevant then the two DBs will
have different names. And if this question isn't directly related,
please tell me and I'll post it separately.
Unfortunately I'm not really starting in the right place for a newbie... once I can get this working I can happily go back to basic Python and SQL but hopefully someone will take pity on me and give me a hand to get started!
For Step 2, my code is below. It seems to be quite close to the sshtunnel example which answers this question Python - SSH Tunnel Setup and MySQL DB Access - though that uses MySQLdb. For the moment I'm embedding the connection parameters – I'll move them to the config file once it's working properly.
import dropbox, pymysql, shlex, shutil, subprocess
from sshtunnel import SSHTunnelForwarder
import iot_config as cfg
def CloseLocalDB():
localcur.close()
localdb.close()
def CloseRemoteDB():
# Disconnect from the database
# remotecur.close()
# remotedb.close()
# Close the SSH tunnel
# ssh.close()
print("end of CloseRemoteDB function")
def OpenLocalDB():
global localcur, localdb
localdb = pymysql.connect(host=cfg.localdbconn['host'], user=cfg.localdbconn['user'], passwd=cfg.localdbconn['passwd'], db=cfg.localdbconn['db'])
localcur = localdb.cursor()
def OpenRemoteDB():
global remotecur, remotedb
with SSHTunnelForwarder(
('my_remote_site', 22),
ssh_username = "my_ssh_username",
ssh_private_key = "/etc/ssh/my_private_key.ppk",
ssh_private_key_password = "my_private_key_password",
remote_bind_address = ('127.0.0.1', 3308)) as server:
remotedb = None
#Following line gives an error if uncommented
# remotedb = pymysql.connect(host='127.0.0.1', user='remote_db_user', passwd='remote_db_password', db='remote_db_name', port=server.local_bind_port)
#remotecur = remotedb.cursor()
# Main program starts here
OpenLocalDB()
CloseLocalDB()
OpenRemoteDB()
CloseRemoteDB()
This is the error I'm getting:
2016-04-21 19:13:33,487 | ERROR | Secsh channel 0 open FAILED: Connection refused: Connect failed
2016-04-21 19:13:33,553 | ERROR | In #1 <-- ('127.0.0.1', 60591) to ('127.0.0.1', 3308) failed: ChannelException(2, 'Connect failed')
----------------------------------------
Exception happened during processing of request from ('127.0.0.1', 60591)
Traceback (most recent call last):
File "/usr/local/lib/python3.4/dist-packages/sshtunnel.py", line 286, in handle
src_address)
File "/usr/local/lib/python3.4/dist-packages/paramiko/transport.py", line 834, in open_channel
raise e
paramiko.ssh_exception.ChannelException: (2, 'Connect failed')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.4/socketserver.py", line 613, in process_request_thread
self.finish_request(request, client_address)
File "/usr/lib/python3.4/socketserver.py", line 344, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/lib/python3.4/socketserver.py", line 669, in __init__
self.handle()
File "/usr/local/lib/python3.4/dist-packages/sshtunnel.py", line 296, in handle
raise HandlerSSHTunnelForwarderError(msg)
sshtunnel.HandlerSSHTunnelForwarderError: In #1 <-- ('127.0.0.1', 60591) to ('127.0.0.1', 3308) failed: ChannelException(2, 'Connect failed')
----------------------------------------
Traceback (most recent call last):
File "/home/pi/Documents/iot_pm2/iot_ssh_example_for_help.py", line 38, in <module>
OpenRemoteDB()
File "/home/pi/Documents/iot_pm2/iot_ssh_example_for_help.py", line 32, in OpenRemoteDB
remotedb = pymysql.connect(host='127.0.0.1', user='remote_db_user', passwd='remote_db_password', db='remote_db_name', port=server.local_bind_port)
File "/usr/local/lib/python3.4/dist-packages/pymysql/__init__.py", line 88, in Connect
return Connection(*args, **kwargs)
File "/usr/local/lib/python3.4/dist-packages/pymysql/connections.py", line 678, in __init__
self.connect()
File "/usr/local/lib/python3.4/dist-packages/pymysql/connections.py", line 889, in connect
self._get_server_information()
File "/usr/local/lib/python3.4/dist-packages/pymysql/connections.py", line 1190, in _get_server_information
packet = self._read_packet()
File "/usr/local/lib/python3.4/dist-packages/pymysql/connections.py", line 945, in _read_packet
packet_header = self._read_bytes(4)
File "/usr/local/lib/python3.4/dist-packages/pymysql/connections.py", line 981, in _read_bytes
2013, "Lost connection to MySQL server during query")
pymysql.err.OperationalError: (2013, 'Lost connection to MySQL server during query')
Thanks in advance.
Answering my own question because, with a lot of help from J.M. Fernández on Github, I have a solution: the example that I copied at the beginning uses port 3308 but port 3306 is the standard. Once I'd changed this it started working.
I'm having a problem when closing a connection as follows:
database = 'sed_database'
conn = MySQLdb.Connect(host='remote_host', user='default',
passwd='pass', db=database)
try:
try:
cursor = conn.cursor()
cursor.execute(sql_str)
results = cursor.fetchall()
except MySQLdb.Error, e:
print "MySQL/Server Error using query: %s" % sql_str
print "Using database: %s" % database
raise e
finally:
if cursor:
cursor.close()
if conn:
conn.close()
This gives:
Traceback (most recent call last):
File "trass.py", line 579, in ?
main(sys.argv)
File "trass.py", line 555, in main
old_rows, changes_list = auto_analyse_test(f, args.build, args.quiet, args.debug)
File "trass.py", line 352, in auto_analyse_test
last_analysed_build = get_sed_baseline_ref(test_file_name, old_delivery_stream)
File "trass.py", line 151, in get_sed_baseline_ref
results = execute_sql_query(sql, delivery_stream)
File "trass.py", line 197, in execute_sql_query
passwd='pass', db=database)
File "C:\Python24\Lib\site-packages\MySQLdb\__init__.py", line 75, in Connect
return Connection(*args, **kwargs)
File "C:\Python24\Lib\site-packages\MySQLdb\connections.py", line 164, in __init__
super(Connection, self).__init__(*args, **kwargs2)
_mysql_exceptions.InternalError: (3, "Error writing file 'D:\\MySQL_Datafiles\\Logfiles\\query.
log' (Errcode: 9)")
Python's MySQLDB library info is as follows:
>>> print MySQLdb.get_client_info()
4.1.18
>>> print MySQLdb.__version__
1.2.1_p2
>>> print MySQLdb.__revision__
410
What is strange is that:
I've checked on the server and query.log exists and is being written to by other processes.
This code works through several iterations, then on a particular item it fails.
The exact query runs fine via SQLyog and yields four results.
The server error.log says "Aborted connection... (Got an error reading comminication packets)"
While the Traceback appears to show the error being associated with the connection creation, it doesn't occur until the connection is closed (or the function ends, which I guess closes it by default). I've tried putting extra output or pauses between open and close. Every time the exception occurs on the close. So what could cause this error on closing the connection?
Here's what I found so far.
It appears that error is triggered when opening a connection, at MySQLdb.Connect(...), 2nd line in pasted code, not when closing a connection.
Full backtrace:
...
execute_sql_query [op]
MySQLdb Connect [op]
MySQLdb super(...) [op]
_mysql.c ConnectionObject_Initialize [lower level pyhon module, written in C]
libmysql mysql_real_connect or mysql_options [probably the earlier]
fails, exception is set
Let's decode the exception
InternalError:
(3,
"Error writing file 'D:\\MySQL_Datafiles\\Logfiles\\query.log'
(Errcode: 9)")
"3" older mysql mysys_err.h EE_WRITE 3
"query.log", is this local or remote log file? appears to be a windows path.
"Errorcode: 9" assuming windows (above), that is ERROR_INVALID_BLOCK "The storage control block address is invalid." Quite cryptic, but it'd go and check if this file exist, if it is writeable, and if it may be subject to logrotate or similar. Check disk space, for a good measure, do a disk check as well.
It appears to be a client-side error. Please check your client-side my.cnf, [client] section.
source code for given MySQLdb version
This awesome code, shows memory leak in tornado's gen module, when connections are closed without reading the response:
import gc
from tornado import web, ioloop, gen
class MainHandler(web.RequestHandler):
#web.asynchronous
#gen.engine
def get(self):
gc.collect()
print len(gc.garbage) # print zombie objects count
self.a = '*' * 500000000 # ~500MB data
CHUNK_COUNT = 100
try:
for i in xrange(CHUNK_COUNT):
self.write('*' * 10000) # write ~10KB of data
yield gen.Task(self.flush) # wait for reciever to recieve
print 'finished'
finally:
print 'finally'
application = web.Application([
(r"/", MainHandler),
])
application.listen(8888)
ioloop.IOLoop.instance().start()
and now, run a simple test client, multiple times
#!/usr/bin/python
import urllib
urlopen('http://127.0.0.1:8888/') # exit without reading response
Now, server output shows, incremental memory usage:
0
WARNING:root:Write error on 8: [Errno 104] Connection reset by peer
1
WARNING:root:Read error on 8: [Errno 104] Connection reset by peer
WARNING:root:error on read
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/tornado-2.4.1-py2.7.egg/tornado/iostream.py", line 361, in _handle_read
if self._read_to_buffer() == 0:
File "/usr/local/lib/python2.7/dist-packages/tornado-2.4.1-py2.7.egg/tornado/iostream.py", line 428, in _read_to_buffer
chunk = self._read_from_socket()
File "/usr/local/lib/python2.7/dist-packages/tornado-2.4.1-py2.7.egg/tornado/iostream.py", line 409, in _read_from_socket
chunk = self.socket.recv(self.read_chunk_size)
error: [Errno 104] Connection reset by peer
2
ERROR:root:Uncaught exception GET / (127.0.0.1)
HTTPRequest(protocol='http', host='127.0.0.1:8888', method='GET', uri='/', version='HTTP/1.0', remote_ip='127.0.0.1', body='', headers={'Host': '127.0.0.1:8888', 'User-Agent': 'Python-urllib/1.17'})
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/tornado-2.4.1-py2.7.egg/tornado/web.py", line 1021, in _stack_context_handle_exception
raise_exc_info((type, value, traceback))
File "/usr/local/lib/python2.7/dist-packages/tornado-2.4.1-py2.7.egg/tornado/web.py", line 1139, in wrapper
return method(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tornado-2.4.1-py2.7.egg/tornado/gen.py", line 120, in wrapper
runner.run()
File "/usr/local/lib/python2.7/dist-packages/tornado-2.4.1-py2.7.egg/tornado/gen.py", line 345, in run
yielded = self.gen.send(next)
File "test.py", line 10, in get
self.a = '*' * 500000000
MemoryError
ERROR:root:500 GET / (127.0.0.1) 3.91ms
If you set CHUNK_COUNT to 1, the 10KB of data can be written to OS connection buffer, and 'finished' and 'finally' texts will be printed to console, and because generator is completed, no memory leak occurs.
But the strange part is that if your remove the try/finally block, the problem disappears!! (even with CHUNK_COUNT set to 100)
Is this a bug on CPython or tornado or ...?!
This bug tested with Tornado 2.4.1 (the latest version when this question asked), and reported on https://github.com/facebook/tornado/issues/660 .
The problem fixed in commit https://github.com/facebook/tornado/commit/769bc52e11656788782a6e7a922ef646503f9ab0 and included in Tornado 3.0.