Python mysqldb error closing connection - python

I'm having a problem when closing a connection as follows:
database = 'sed_database'
conn = MySQLdb.Connect(host='remote_host', user='default',
passwd='pass', db=database)
try:
try:
cursor = conn.cursor()
cursor.execute(sql_str)
results = cursor.fetchall()
except MySQLdb.Error, e:
print "MySQL/Server Error using query: %s" % sql_str
print "Using database: %s" % database
raise e
finally:
if cursor:
cursor.close()
if conn:
conn.close()
This gives:
Traceback (most recent call last):
File "trass.py", line 579, in ?
main(sys.argv)
File "trass.py", line 555, in main
old_rows, changes_list = auto_analyse_test(f, args.build, args.quiet, args.debug)
File "trass.py", line 352, in auto_analyse_test
last_analysed_build = get_sed_baseline_ref(test_file_name, old_delivery_stream)
File "trass.py", line 151, in get_sed_baseline_ref
results = execute_sql_query(sql, delivery_stream)
File "trass.py", line 197, in execute_sql_query
passwd='pass', db=database)
File "C:\Python24\Lib\site-packages\MySQLdb\__init__.py", line 75, in Connect
return Connection(*args, **kwargs)
File "C:\Python24\Lib\site-packages\MySQLdb\connections.py", line 164, in __init__
super(Connection, self).__init__(*args, **kwargs2)
_mysql_exceptions.InternalError: (3, "Error writing file 'D:\\MySQL_Datafiles\\Logfiles\\query.
log' (Errcode: 9)")
Python's MySQLDB library info is as follows:
>>> print MySQLdb.get_client_info()
4.1.18
>>> print MySQLdb.__version__
1.2.1_p2
>>> print MySQLdb.__revision__
410
What is strange is that:
I've checked on the server and query.log exists and is being written to by other processes.
This code works through several iterations, then on a particular item it fails.
The exact query runs fine via SQLyog and yields four results.
The server error.log says "Aborted connection... (Got an error reading comminication packets)"
While the Traceback appears to show the error being associated with the connection creation, it doesn't occur until the connection is closed (or the function ends, which I guess closes it by default). I've tried putting extra output or pauses between open and close. Every time the exception occurs on the close. So what could cause this error on closing the connection?

Here's what I found so far.
It appears that error is triggered when opening a connection, at MySQLdb.Connect(...), 2nd line in pasted code, not when closing a connection.
Full backtrace:
...
execute_sql_query [op]
MySQLdb Connect [op]
MySQLdb super(...) [op]
_mysql.c ConnectionObject_Initialize [lower level pyhon module, written in C]
libmysql mysql_real_connect or mysql_options [probably the earlier]
fails, exception is set
Let's decode the exception
InternalError:
(3,
"Error writing file 'D:\\MySQL_Datafiles\\Logfiles\\query.log'
(Errcode: 9)")
"3" older mysql mysys_err.h EE_WRITE 3
"query.log", is this local or remote log file? appears to be a windows path.
"Errorcode: 9" assuming windows (above), that is ERROR_INVALID_BLOCK "The storage control block address is invalid." Quite cryptic, but it'd go and check if this file exist, if it is writeable, and if it may be subject to logrotate or similar. Check disk space, for a good measure, do a disk check as well.
It appears to be a client-side error. Please check your client-side my.cnf, [client] section.
source code for given MySQLdb version

Related

net-Lib error during Connection timed out (110) on Airflow

I a running a process on Apache Airflow that has a loop in which it reads data from a MSSQL data base, adds two columns and writes the data to another MSSQL data base. I am using MsSqlHook to connect to both bases
The process usually runs fine with a loop that reads and loads the data, but sometimes, after some successful data writes, I get the following error message:
ERROR - (20009, b'DB-Lib error message 20009, severity 9:\nUnable to connect: Adaptive Server is unavailable or does not exist (SOURCE_DB.database.windows.net:PORT)\nNet-Lib error during Connection timed out (110)\nDB-Lib error message 20009, severity 9:\nUnable to connect: Adaptive Server is unavailable or does not exist (SOURCE_DB.database.windows.net:PORT)\nNet-Lib error during Connection timed out (110)\n')
Traceback (most recent call last):
File "src/pymssql.pyx", line 636, in pymssql.connect
File "src/_mssql.pyx", line 1957, in _mssql.connect
File "src/_mssql.pyx", line 676, in _mssql.MSSQLConnection.__init__
File "src/_mssql.pyx", line 1683, in _mssql.maybe_raise_MSSQLDatabaseException
_mssql.MSSQLDatabaseException: (20009, b'DB-Lib error message 20009, severity 9:\nUnable to connect: Adaptive Server is unavailable or does not exist (SOURCE_DB.database.windows.net:PORT)\nNet-Lib error during Connection timed out (110)\nDB-Lib error message 20009, severity 9:\nUnable to connect: Adaptive Server is unavailable or does not exist (SOURCE_DB.database.windows.net:PORT)\nNet-Lib error during Connection timed out (110)\n')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 984, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.7/site-packages/airflow/operators/python_operator.py", line 113, in execute
return_value = self.execute_callable()
File "/usr/local/lib/python3.7/site-packages/airflow/operators/python_operator.py", line 118, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File "/usr/local/airflow/dags/DAG_NAME.py", line 156, in readWriteData
df = readFromSource(query)
File "/usr/local/airflow/dags/MX_CENT_SAMS_EXIT_APP_ITMS_MIGRATION.py", line 112, in readFromSource
df = mssql_hook.get_pandas_df(sql=query)
File "/usr/local/lib/python3.7/site-packages/airflow/hooks/dbapi_hook.py", line 99, in get_pandas_df
with closing(self.get_conn()) as conn:
File "/usr/local/lib/python3.7/site-packages/airflow/hooks/mssql_hook.py", line 48, in get_conn
port=conn.port)
File "src/pymssql.pyx", line 642, in pymssql.connect
I am guessing this is because the connection to the source data base is unstable, and whenever it is interrupted it can't reestablish it, so is there a way to pause or make the process wait if the source connection becomes unavaliable?
This is my current code:
def readFromSource(query):
"""
Args: query--> Query to be executed
Returns: Dataframe with source tables data
"""
print("Executing readFromSource()")
mssql_hook = MsSqlHook(mssql_conn_id=SRC_CONN)
mssql_hook.autocommit = True
df = mssql_hook.get_pandas_df(sql=query)
print(f"Source rows: {df.shape[0]}")
print("readFromSource() execution completed")
return df
def writeToTarget(df):
print("Executing writeToTarget()")
try:
fast_sql_conn = FastMSSQLConnection(TGT_CONN)
tgt_conn = fast_sql_conn.getConnection()
with closing(tgt_conn) as conn:
df.to_sql(
name=TGT_TABLE,
schema='dbo',
con=conn,
chunksize=CHUNK_SIZE,
method='multi',
index=False,
if_exists='append'
)
except Exception as e:
print("Error while loading data to target: " + str(e))
print("writeToTarget() execution completed")
def readWriteData(*op_args, **context):
"""Loads info to target table
"""
print("Executing readWriteData()")
partition_column_list = context['ti'].xcom_pull(
task_ids='getPartitionColumnList')
parallelProcParams = context['ti'].xcom_pull(
task_ids='setParallelProcessingParams')
range_start = parallelProcParams['i'][op_args[0]][0]
range_len = parallelProcParams['i'][op_args[0]][1]
for i in range(range_start, range_start + range_len):
filter_ = partition_column_list[i]
print(f"Executing for audititemid: {filter_}")
query = SRC_QUERY + ' and audititemid = ' + str(filter_).replace("[","").replace("]","") # a exit app
df = readFromSource(query)
df = df.rename(columns={"createdate": "CREAT_DATE", "scannedqty": "SCANNED_QTY", "audititemid":"AUDT_ITM_ID", "auditid":"AUDT_ID", "upc":"UPC", "itemnbr":"ITM_NBR", "txqty":"TXNS_QTY", "displayname":"DSPLY_NAME", "unitprice":"UNIT_PRICE", "cancelled":"CNCL"})
df['LOADG_CHNNL'] = 'Airflow Exit App DB'
df['LOADG_DATE'] = datetime.now()
writeToTarget(df)
print("readWriteData() execution completed")
You could split the task in two:
Read from DB and persist
Read persisted data and write to DB
The first task will read the data, transform it, and persist it (e.g., on the local disk). The second one will read the persisted data and write it to DB using a transaction. For the second task set the number of retries as needed.
Now, if the connection times out the second task will fail, the changes to DB will be rolled back, and Airflow will retry the task as many times as you set.

peewee.OperationalError: unable to close due to unfinalized statements or unfinished backups

I want to write a script in python which will work in loop. Script uses sqlite database which is referenced by peewee. I can't put all code in here because it has a few hundreds of lines, but I'll show the part of my code written in peewee.
When I run my code only once then everything works fine, although it must work for a few days, so it has to be running in loop. When I do the loop then I get this error in the second iteration:
peewee.OperationalError: Connection already opened.
I tried to solve it by simply closing connection using this line:
db.close()
But... then I get this:
File "/usr/local/lib/python3.6/site-packages/peewee.py", line 2677, in
close
self._close(self._state.conn)
File "/usr/local/lib/python3.6/site-packages/peewee.py", line 2683, in _
close
conn.close()
sqlite3.OperationalError: unable to close due to unfinalized statements
or unfinished backups
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "run.py", line 76, in <module>
main()
File "run.py", line 69, in main
db.close()
File "/usr/local/lib/python3.6/site-packages/peewee.py", line 2677, in
close
self._close(self._state.conn)
File "/usr/local/lib/python3.6/site-packages/peewee.py", line 2509, in
__exit__
reraise(new_type, new_type(*exc_args), traceback)
File "/usr/local/lib/python3.6/site-packages/peewee.py", line 186, in
reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.6/site-packages/peewee.py", line 2677, in
close
self._close(self._state.conn)
File "/usr/local/lib/python3.6/site-packages/peewee.py", line 2683, in
_close
conn.close()
peewee.OperationalError: unable to close due to unfinalized statements or
unfinished backups
class PeeweeDatabase:
def __init__(self):
db.connect()
#staticmethod
def create_tables():
with db:
db.create_tables([Model1, Model2, Model3])
#staticmethod
def save_Problem(view_name, id, link):
current_table = globals()[view_name]
try:
current_table.insert({
NewProblemCreated.ID: id,
NewProblemCreated.link: link,
NewProblemCreated.deliveryDate: 0,
NewProblemCreated.firstEncounter: datetime.now(),
NewProblemCreated.latestEncounter: datetime.now(),
NewProblemCreated.HowMuchTimesSent: 0,
NewProblemCreated.EncounteredBefore: False,
}).execute()
logger.info('Problem {} saved'.format(id))
except IntegrityError:
pass
#staticmethod
def update_latest_delivery(view_name, id):
current_table = globals()[view_name]
(current_table
.update(deliveryDate=datetime.now(), HowMuchTimesSent=current_table.HowMuchTimesSent+1)
.where(current_table.ID == pr_id)
.execute())
#staticmethod
def check_last_delivery(view_name, pr_id):
current_table = globals()[view_name]
res = (current_table
.select(current_table.deliveryDate)
.where(pr_id == current_table.prID)
.namedtuples()
)
return res[0][0]
Anyone faced this problem before?
Earlier I used SQL queries directly and had no problems but I wanted to use some ORM.
My guess is you're using an ancient version of SQLite on your system. The python sqlite3 driver will use sqlite3_close_v2 if SQLite is > 3.07, otherwise it uses sqlite3_close, which is responsible for this issue.
You can try exhausting any cursors and rolling-back any potentially open transactions. You can also run in autocommit mode.
Well, I know already how to fix this... I have deleted db.connect and now everything works fine. I thought that I have tried this already but it seems I didn't. I guess when im using ORM I should avoid to using sql commands and let ORM do it on it's own way.

Cannot Connect To RDS Instance Using Python pymysql Library

I am attempting to connect to an RDS instance using Python's pymysql library. Unfortunately I'm getting an exception back.
Here is my Python code:
import pymysql
#Connect
connection = pymysql.connect(host='jdbc:mysql://dbname.url.us-east-1.rds.amazonaws.com/dbname',
port='3306',
user='username',
password='pass',
database='dbname')
Upon executing it, I receive the following exception:
Traceback (most recent call last):
File "/path/to/file/test.py", line 8, in <module>
database='treemigodb')
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pymysql/__init__.py", line 94, in Connect
return Connection(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pymysql/connections.py", line 327, in __init__
self.connect()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pymysql/connections.py", line 629, in connect
raise exc
pymysql.err.OperationalError: (2003, "Can't connect to MySQL server on 'jdbc:mysql://dbname.url.us-east-1.rds.amazonaws.com/dbname' ([Errno 8] nodename nor servname provided, or not known)")
Here is what I've tried:
I've opened up the security group on the AWS side to traffic. I had previously performed this connection in Java, with success. AWS is configured as it should be.
I've seen in other examples that the host format tends to vary. Therefore, I've tried:
jdbc:mysql://dbname.url.us-east-1.rds.amazonaws.com/dbname
jdbc:mysql://url.us-east-1.rds.amazonaws.com/dbname
jdbc:mysql://dbname.url.us-east-1.rds.amazonaws.com/
jdbc:mysql://url.us-east-1.rds.amazonaws.com/
dbname.url.us-east-1.rds.amazonaws.com/dbname
url.us-east-1.rds.amazonaws.com/dbname
dbname.url.us-east-1.rds.amazonaws.com/
Is there some potential setup that I could be missing? I've double-checked that every other parameter matches.

Accessing HDInsight Hive with python

We have a HDInsight cluster with some tables in HIVE. I want to query these tables from Python 3.6 from a client machine (outside Azure).
I have tried using PyHive, pyhs2 and also impyla but I am running into various problems with all of them.
Does anybody have a working example of accessing a HDInsight HIVE from Python?
I have very little experience with this, and don't know how to configure PyHive (which seems the most promising), especially regarding authorization.
With impyla:
from impala.dbapi import connect
conn = connect(host='redacted.azurehdinsight.net',port=443)
cursor = conn.cursor()
cursor.execute('SELECT * FROM cs_test LIMIT 100')
print(cursor.description) # prints the result set's schema
results = cursor.fetchall()
This gives:
Traceback (most recent call last):
File "C:/git/ml-notebooks/impyla.py", line 3, in <module>
cursor = conn.cursor()
File "C:\Users\chris\Anaconda3\lib\site-packages\impala\hiveserver2.py", line 125, in cursor
session = self.service.open_session(user, configuration)
File "C:\Users\chris\Anaconda3\lib\site-packages\impala\hiveserver2.py", line 995, in open_session
resp = self._rpc('OpenSession', req)
File "C:\Users\chris\Anaconda3\lib\site-packages\impala\hiveserver2.py", line 923, in _rpc
response = self._execute(func_name, request)
File "C:\Users\chris\Anaconda3\lib\site-packages\impala\hiveserver2.py", line 954, in _execute
.format(self.retries))
impala.error.HiveServer2Error: Failed after retrying 3 times
With Pyhive:
from pyhive import hive
conn = hive.connect(host="redacted.azurehdinsight.net",port=443,auth="NOSASL")
#also tried other auth-types, but as i said, i have no clue here
This gives:
Traceback (most recent call last):
File "C:/git/ml-notebooks/PythonToHive.py", line 3, in <module>
conn = hive.connect(host="redacted.azurehdinsight.net",port=443,auth="NOSASL")
File "C:\Users\chris\Anaconda3\lib\site-packages\pyhive\hive.py", line 64, in connect
return Connection(*args, **kwargs)
File "C:\Users\chris\Anaconda3\lib\site-packages\pyhive\hive.py", line 164, in __init__
response = self._client.OpenSession(open_session_req)
File "C:\Users\chris\Anaconda3\lib\site-packages\TCLIService\TCLIService.py", line 187, in OpenSession
return self.recv_OpenSession()
File "C:\Users\chris\Anaconda3\lib\site-packages\TCLIService\TCLIService.py", line 199, in recv_OpenSession
(fname, mtype, rseqid) = iprot.readMessageBegin()
File "C:\Users\chris\Anaconda3\lib\site-packages\thrift\protocol\TBinaryProtocol.py", line 134, in readMessageBegin
sz = self.readI32()
File "C:\Users\chris\Anaconda3\lib\site-packages\thrift\protocol\TBinaryProtocol.py", line 217, in readI32
buff = self.trans.readAll(4)
File "C:\Users\chris\Anaconda3\lib\site-packages\thrift\transport\TTransport.py", line 60, in readAll
chunk = self.read(sz - have)
File "C:\Users\chris\Anaconda3\lib\site-packages\thrift\transport\TTransport.py", line 161, in read
self.__rbuf = BufferIO(self.__trans.read(max(sz, self.__rbuf_size)))
File "C:\Users\chris\Anaconda3\lib\site-packages\thrift\transport\TSocket.py", line 117, in read
buff = self.handle.recv(sz)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
According to the offical document Understand and resolve errors received from WebHCat on HDInsight, it said as below.
What is WebHCat
WebHCat is a REST API for HCatalog, a table, and storage management layer for Hadoop. WebHCat is enabled by default on HDInsight clusters, and is used by various tools to submit jobs, get job status, etc. without logging in to the cluster.
So a workaround way is to use WebHCat to run the Hive QL in Python, please refer to the Hive document to learn & use it. As reference, there is a similar MSDN thread discussed about it.
Hope it helps.
Technically you should be able to use the Thrift connector and pyhive but I haven't had any success with this. However I have successfully used the JDBC connector using JayDeBeAPI.
First you need to download the JDBC driver.
http://central.maven.org/maven2/org/apache/hive/hive-jdbc/1.2.1/hive-jdbc-1.2.1-standalone.jar
http://repo1.maven.org/maven2/org/apache/httpcomponents/httpclient/4.4/httpclient-4.4.jar
http://central.maven.org/maven2/org/apache/httpcomponents/httpcore/4.4.4/httpcore-4.4.4.jar
I put mine in /jdbc and used JayDeBeAPI with the following connection string.
edit: You need to add /jdbc/* to your CLASSPATH environment variable.
import jaydebeapi
conn = jaydebeapi.connect("org.apache.hive.jdbc.HiveDriver",
"jdbc:hive2://my_ip_or_url:443/;ssl=true;transportMode=http;httpPath=/hive2",
[username, password],
"/jdbc/hive-jdbc-1.2.1.jar")

Connection to remote MySQL db from Python 3.4

I'm trying to connect to two MySQL databases (one local, one remote) at the same time using Python 3.4 but I'm really struggling. Splitting the problem into three:
Step 1: connect to the local DB. This is working fine
using PyMySQL. (MySQLdb isn't compatible with Python 3.4, of
course.)
Step 2: connect to the remote DB (which needs to
use SSH). I can get it to work from the Linux command prompt but not
from Python... see below.
Step 3: connect to both at the
same time. I think I'm supposed to use a different port for the
remote database so that I can have both connections at the same time
but I'm out of my depth here! If it's relevant then the two DBs will
have different names. And if this question isn't directly related,
please tell me and I'll post it separately.
Unfortunately I'm not really starting in the right place for a newbie... once I can get this working I can happily go back to basic Python and SQL but hopefully someone will take pity on me and give me a hand to get started!
For Step 2, my code is below. It seems to be quite close to the sshtunnel example which answers this question Python - SSH Tunnel Setup and MySQL DB Access - though that uses MySQLdb. For the moment I'm embedding the connection parameters – I'll move them to the config file once it's working properly.
import dropbox, pymysql, shlex, shutil, subprocess
from sshtunnel import SSHTunnelForwarder
import iot_config as cfg
def CloseLocalDB():
localcur.close()
localdb.close()
def CloseRemoteDB():
# Disconnect from the database
# remotecur.close()
# remotedb.close()
# Close the SSH tunnel
# ssh.close()
print("end of CloseRemoteDB function")
def OpenLocalDB():
global localcur, localdb
localdb = pymysql.connect(host=cfg.localdbconn['host'], user=cfg.localdbconn['user'], passwd=cfg.localdbconn['passwd'], db=cfg.localdbconn['db'])
localcur = localdb.cursor()
def OpenRemoteDB():
global remotecur, remotedb
with SSHTunnelForwarder(
('my_remote_site', 22),
ssh_username = "my_ssh_username",
ssh_private_key = "/etc/ssh/my_private_key.ppk",
ssh_private_key_password = "my_private_key_password",
remote_bind_address = ('127.0.0.1', 3308)) as server:
remotedb = None
#Following line gives an error if uncommented
# remotedb = pymysql.connect(host='127.0.0.1', user='remote_db_user', passwd='remote_db_password', db='remote_db_name', port=server.local_bind_port)
#remotecur = remotedb.cursor()
# Main program starts here
OpenLocalDB()
CloseLocalDB()
OpenRemoteDB()
CloseRemoteDB()
This is the error I'm getting:
2016-04-21 19:13:33,487 | ERROR | Secsh channel 0 open FAILED: Connection refused: Connect failed
2016-04-21 19:13:33,553 | ERROR | In #1 <-- ('127.0.0.1', 60591) to ('127.0.0.1', 3308) failed: ChannelException(2, 'Connect failed')
----------------------------------------
Exception happened during processing of request from ('127.0.0.1', 60591)
Traceback (most recent call last):
File "/usr/local/lib/python3.4/dist-packages/sshtunnel.py", line 286, in handle
src_address)
File "/usr/local/lib/python3.4/dist-packages/paramiko/transport.py", line 834, in open_channel
raise e
paramiko.ssh_exception.ChannelException: (2, 'Connect failed')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.4/socketserver.py", line 613, in process_request_thread
self.finish_request(request, client_address)
File "/usr/lib/python3.4/socketserver.py", line 344, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/lib/python3.4/socketserver.py", line 669, in __init__
self.handle()
File "/usr/local/lib/python3.4/dist-packages/sshtunnel.py", line 296, in handle
raise HandlerSSHTunnelForwarderError(msg)
sshtunnel.HandlerSSHTunnelForwarderError: In #1 <-- ('127.0.0.1', 60591) to ('127.0.0.1', 3308) failed: ChannelException(2, 'Connect failed')
----------------------------------------
Traceback (most recent call last):
File "/home/pi/Documents/iot_pm2/iot_ssh_example_for_help.py", line 38, in <module>
OpenRemoteDB()
File "/home/pi/Documents/iot_pm2/iot_ssh_example_for_help.py", line 32, in OpenRemoteDB
remotedb = pymysql.connect(host='127.0.0.1', user='remote_db_user', passwd='remote_db_password', db='remote_db_name', port=server.local_bind_port)
File "/usr/local/lib/python3.4/dist-packages/pymysql/__init__.py", line 88, in Connect
return Connection(*args, **kwargs)
File "/usr/local/lib/python3.4/dist-packages/pymysql/connections.py", line 678, in __init__
self.connect()
File "/usr/local/lib/python3.4/dist-packages/pymysql/connections.py", line 889, in connect
self._get_server_information()
File "/usr/local/lib/python3.4/dist-packages/pymysql/connections.py", line 1190, in _get_server_information
packet = self._read_packet()
File "/usr/local/lib/python3.4/dist-packages/pymysql/connections.py", line 945, in _read_packet
packet_header = self._read_bytes(4)
File "/usr/local/lib/python3.4/dist-packages/pymysql/connections.py", line 981, in _read_bytes
2013, "Lost connection to MySQL server during query")
pymysql.err.OperationalError: (2013, 'Lost connection to MySQL server during query')
Thanks in advance.
Answering my own question because, with a lot of help from J.M. Fernández on Github, I have a solution: the example that I copied at the beginning uses port 3308 but port 3306 is the standard. Once I'd changed this it started working.

Categories