Python: JDBC Connection Error to Apache Drill Error with JayDeBeApi - python

I am trying to connect to Apache Drill from python using jaydebeapi library.
I have turned on drill in embedded mode via drill-embedded, and the web ui runs correctly in port 8047. Then, I am trying to connect via JDBC through a python script:
import jaydebeapi
import jpype
import os
DRILL_HOME = os.environ["DRILL_HOME"]
classpath = DRILL_HOME + "/jars/jdbc-driver/drill-jdbc-all-1.17.0.jar"
jpype.startJVM(jpype.getDefaultJVMPath(), "-Djava.class.path=%s" % classpath)
conn = jaydebeapi.connect(
'org.apache.drill.jdbc.Driver',
'jdbc:drill:drillbit=localhost:8047'
)
but I get this error
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Traceback (most recent call last):
File "jaydebe_drill.py", line 10, in <module>
'jdbc:drill:drillbit=localhost:8047'
File "/Users/user/opt/anaconda3/lib/python3.7/site-packages/jaydebeapi/__init__.py", line 412,
in connect
jconn = _jdbc_connect(jclassname, url, driver_args, jars, libs)
File "/Users/user/opt/anaconda3/lib/python3.7/site-packages/jaydebeapi/__init__.py", line 230,
in _jdbc_connect_jpype
return jpype.java.sql.DriverManager.getConnection(url, *dargs)
jpype._jexception.SQLNonTransientConnectionExceptionPyRaisable:
java.sql.SQLNonTransientConnectionException:
Failure in connecting to Drill: oadd.org.apache.drill.exec.rpc.ChannelClosedException:
Channel closed /127.0.0.1:62244 <--> localhost/127.0.0.1:8047.
Does anyone knows how to solve the issue?

Thanks to #Luke Woodward suggestion, the problem was the port. For drill-embedded there is no port to select. Below a full query example
import jaydebeapi
import jpype
import os
import pandas as pd
DRILL_HOME = os.environ["DRILL_HOME"]
classpath = DRILL_HOME + "/jars/jdbc-driver/drill-jdbc-all-1.17.0.jar"
jpype.startJVM(jpype.getDefaultJVMPath(), "-Djava.class.path=%s" % classpath)
conn = jaydebeapi.connect(
'org.apache.drill.jdbc.Driver',
'jdbc:drill:drillbit=localhost'
)
cursor = conn.cursor()
query = """
SELECT *
FROM dfs.`/Users/user/data.parquet`
LIMIT 1
"""
cursor.execute(query)
columns = [c[0] for c in cursor.description]
data = cursor.fetchall()
df = pd.DataFrame(data, columns=columns)
df.head()

Related

Remote Connection fails in setup of Python data-science client for SQL Server Machine Learning Services

I am trying to test the remote connection of a Python data-science client with SQL Server Machine Learning Services following this guide: https://learn.microsoft.com/en-us/sql/machine-learning/python/setup-python-client-tools-sql (section 6).
Running the following script
def send_this_func_to_sql():
from revoscalepy import RxSqlServerData, rx_import
from pandas.tools.plotting import scatter_matrix
import matplotlib.pyplot as plt
import io
# remember the scope of the variables in this func are within our SQL Server Python Runtime
connection_string = "Driver=SQL Server;Server=localhost\instance02;Database=testmlsiris;Trusted_Connection=Yes;"
# specify a query and load into pandas dataframe df
sql_query = RxSqlServerData(connection_string=connection_string, sql_query = "select * from iris_data")
df = rx_import(sql_query)
scatter_matrix(df)
# return bytestream of image created by scatter_matrix
buf = io.BytesIO()
plt.savefig(buf, format="png")
buf.seek(0)
return buf.getvalue()
new_db_name = "testmlsiris"
connection_string = "driver={sql server};server=sqlrzs\instance02;database=%s;trusted_connection=yes;"
from revoscalepy import RxInSqlServer, rx_exec
# create a remote compute context with connection to SQL Server
sql_compute_context = RxInSqlServer(connection_string=connection_string%new_db_name)
# use rx_exec to send the function execution to SQL Server
image = rx_exec(send_this_func_to_sql, compute_context=sql_compute_context)[0]
yields the following error message returned by rx_exec (stored in the image variable)
connection_string: "driver={sql server};server=sqlrzs\instance02;database=testmlsiris;trusted_connection=yes;"
num_tasks: 1
execution_timeout_seconds: 0
wait: True
console_output: False
auto_cleanup: True
packages_to_load: []
description: "sqlserver"
version: "1.0"
XXX lineno: 2, opcode: 0
Traceback (most recent call last):
File "<string>", line 3, in <module>
File "E:\SQL\MSSQL15.INSTANCE02\PYTHON_SERVICES\lib\site-packages\revoscalepy\computecontext\RxInSqlServer.py", line 664, in rx_sql_satellite_pool_call
exec(inputfile.read())
File "<string>", line 34, in <module>
File "E:\SQL\MSSQL15.INSTANCE02\PYTHON_SERVICES\lib\site-packages\revoscalepy\computecontext\RxInSqlServer.py", line 886, in rx_remote_call
results = rx_resumeexecution(state_file = inputfile, patched_server_name=args["hostname"])
File "E:\SQL\MSSQL15.INSTANCE02\PYTHON_SERVICES\lib\site-packages\revoscalepy\computecontext\RxInSqlServer.py", line 135, in rx_resumeexecution
return _state["function"](**_state["args"])
File "C:\Users\username\sendtosql.py", line 2, in send_this_func_to_sql
SystemError: unknown opcode
====== sqlrzs ( process 0 ) has started run at 2022-06-29 13:47:04 W. Europe Daylight Time ======
{'local_state': {}, 'args': {}, 'function': <function send_this_func_to_sql at 0x0000020F5810F1E0>}
What is going wrong here? Line 2 in the script is just an import (which works when testing Python scripts on SQL Server directly). Any help is appreciated - thanks.
I just figured out the reason. As of today, the Python versions for the data clients in https://learn.microsoft.com/de-de/sql/machine-learning/python/setup-python-client-tools-sql?view=sql-server-ver15 are not the newest (revoscalepy Version 9.3), while the version of Machine Learning Services that we have running in our SQL Server is already 9.4.7.
However, the revoscalepy libraries for the client and server must be the same, otherwise the deserialization fails server-sided.

How to execute a MySQL query with a python script using the MySQLdb library?

I tried to modify an ETL but I have found that the old developer executes his commands directly on the connection (the ETL has been running for a few years). When I try to do it myself I get an error (because my compiler expects me to do it from a cursor).
from etl.utils.logging import info
from etl.mysql.connect import db, db_name
from etl.mysql.operations import add_column_if_not_exists
from etl.utils.array import chunks
from pprint import pprint
def add_column_exclude_from_statistics():
with db as c:
# Create new columns where exclude_from_statistics
info("Creating column exclude from statistics")
c.execute("""
UPDATE orders
INNER JOIN contacts ON orders.id = contacts.`Contact ID`
IF contacts.`Great Benefactor` = true OR orders.Campaign = `nuit-pour-la-mission`
SET orders.exclude_from_statistics = 1
ELSE
SET orders.exclude_from_statistics = 0
;
""")
def main():
info("Table crm.orders")
add_column_exclude_from_statistics()
if __name__ == '__main__':
main()
But it returns that 'Connection' object has no attribute 'execute':
(venv) C:\Users\antoi\Documents\Programming\Work\data-tools>py -m etl.task.crm_orders_exclude_from_statistics
2021-06-25 17:12:44.357297 - Connecting to database hozana_data...
2021-06-25 17:12:44.365267 - Connecting to archive database hozana_archive...
2021-06-25 17:12:44.365267 - Table crm.orders
2021-06-25 17:12:44.365267 - Creating column exclude from statistics
Traceback (most recent call last):
File "C:\Users\antoi\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\antoi\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\antoi\Documents\Programming\Work\data-tools\etl\task\crm_orders_exclude_from_statistics.py", line 28, in <module>
main()
File "C:\Users\antoi\Documents\Programming\Work\data-tools\etl\task\crm_orders_exclude_from_statistics.py", line 24, in main
add_column_exclude_from_statistics()
File "C:\Users\antoi\Documents\Programming\Work\data-tools\etl\task\crm_orders_exclude_from_statistics.py", line 12, in add_column_exclude_from_statistic
s
c.execute("""
AttributeError: 'Connection' object has no attribute 'execute'
Here is what we have in etl.mysql.connect
import os
import MySQLdb
from etl.utils.logging import info
db_host = os.environ['DB_HOST']
db_port = int(os.environ['DB_PORT'])
db_user = os.environ['DB_USER']
db_password = os.environ['DB_PASSWORD']
db_name = os.environ['DB_NAME']
db_name_archive = os.environ['DB_ARCHIVE_NAME']
info("Connecting to database {}...".format(db_name))
db = MySQLdb.connect(host=db_host,
port=db_port,
db=db_name,
user=db_user,
passwd=db_password)
It is strange to have done that, isn't it? Is it my MySQLdb library that is not up to date?
Here are the MySQL related libraries. I did not find MySQLdb:
(venv) C:\Users\antoi\Documents\Programming\Work\data-tools>pip list |findstr mysql
mysql 0.0.3
mysql-connector-python 8.0.25
mysqlclient 2.0.3
According to the documentation you first need to create a cursor after the connection is open as the 'Connection' object does not have the execute method but the Cursor one does, therefore using your code sample:
from etl.utils.logging import info
from etl.mysql.connect import db, db_name
from etl.mysql.operations import add_column_if_not_exists
from etl.utils.array import chunks
from pprint import pprint
def add_column_exclude_from_statistics():
with db as c:
# Create new columns where exclude_from_statistics
info("Creating column exclude from statistics")
cursor = c.cursor() #Get the cursor
cursor.execute("""
UPDATE orders
INNER JOIN contacts ON orders.id = contacts.`Contact ID`
IF contacts.`Great Benefactor` = true OR orders.Campaign = `nuit-pour-la-mission`
SET orders.exclude_from_statistics = 1
ELSE
SET orders.exclude_from_statistics = 0
;
""")
def main():
info("Table crm.orders")
add_column_exclude_from_statistics()
if __name__ == '__main__':
main()

Connecting python to oracle - DatabaseError: Error while trying to retrieve text for error ORA-01804

I encounter DatabaseError: Error while trying to retrieve text for error ORA-01804 when I try to connect python to oracle. I have downloaded instant client 19.3.0.0.0
I am using MacBook(10.15.3). Here is my code:
! pip install cx_Oracle
import io
import base64
from urllib.request import urlopen
import os
os.chdir("/Users/aa/Option/Oracle/instantclient_19_3-2") # use the path we copied from step 5
import cx_Oracle
dsn_tns = cx_Oracle.makedsn("aaa", "bbb", service_name="ccc")
conn = cx_Oracle.connect(user= "ddd", password="222", dsn=dsn_tns)
c = conn.cursor()
it returns:
Requirement already satisfied: cx_Oracle in /Users/aa/opt/anaconda3/lib/python3.7/site-packages (7.3.0)
---------------------------------------------------------------------------
DatabaseError Traceback (most recent call last)
<ipython-input-17-d2cb8e0df445> in <module>
10
11 dsn_tns = cx_Oracle.makedsn("aaa", "bbb", service_name="ccc")
---> 12 conn = cx_Oracle.connect(user= "ddd", password="222", dsn=dsn_tns)
13 c = conn.cursor()
DatabaseError: Error while trying to retrieve text for error ORA-01804
How can I solve it? Thank you very much.
Make sure that your ORACLE_HOME and LD_LIBRARY_PATH environment variables are correctly set (or unset) before you run your script.
https://github.com/oracle/python-cx_Oracle/issues/363
https://github.com/oracle/python-cx_Oracle/issues/346
Error while trying to retrieve text for error ORA-01804

Python MySQLdb Connection Import Error

I am trying to connect with db using the following code:
import MySQLdb
db = MySQLdb.connect(host="localhost", # your host, usually localhost
user="root", # your username
passwd="root", # your password
db="test101") # name of the data base
# you must create a Cursor object. It will let
# you execute all the queries you need
cur = db.cursor()
# Use all the SQL you like
cur.execute("SELECT * FROM test1")
# print all the first cell of all the rows
for row in cur.fetchall():
print row[0]
db.close()
However, I am getting the following error message on the console:
Traceback (most recent call last):
File "C:\Users\JRambo\workspace\DBConnection\src\DBConnection.py", line 6, in <module>
import MySQLdb
File "C:\Python27\lib\site-packages\MySQLdb\__init__.py", line 19, in <module>
import _mysql
ImportError: DLL load failed: %1 is not a valid Win32 application.
I have followed the steps meticulously.
How do I connect to a MySQL Database in Python?
You might want to verify that you have the correct bit Python and correct bit MySQLdb. If you have 32 bit Python and 64 bit MySQLdb it won't work. I had a similar problem with the same Traceback error and when I installed the correct bit type of each application, bingo! Hope this helps!

Connect to DB2 via JayDeBeApi JDBC in Python

I've been struggling for a while to connect to DB2 via Python client on OSX (maveriks). A valid option seem to be using JayDeBeApi but, running the following code...
import jaydebeapi
import jpype
jar = '/opt/IBM/db2/V10.1/java/db2jcc4.jar' # location of the jdbc driver jar
args='-Djava.class.path=%s' % jar
jvm = jpype.getDefaultJVMPath()
jpype.startJVM(jvm, args)
jaydebeapi.connect('com.ibm.db2.jcc.DB2Driver',
'jdbc:db2://server:port/database','myusername','mypassword')
I'll get the following error
Traceback (most recent call last):
File "<pyshell#67>", line 2, in <module>
'jdbc:db2://server:port/database','myusername','mypassword')
File "/Library/Python/2.7/site-packages/jaydebeapi/dbapi2.py", line 269, in connect
jconn = _jdbc_connect(jclassname, jars, libs, *driver_args)
File "/Library/Python/2.7/site-packages/jaydebeapi/dbapi2.py", line 117, in _jdbc_connect_jpype
return jpype.java.sql.DriverManager.getConnection(*driver_args)
com.ibm.db2.jcc.am.SqlSyntaxErrorExceptionPyRaisable: com.ibm.db2.jcc.am.SqlSyntaxErrorException: [jcc][t4][10205][11234][3.63.123] Null userid is not supported. ERRORCODE=-4461, SQLSTATE=42815
So basically I'm connecting to the server, but for some reason I'm not using the username & password provided. Any idea on how to pass correctly username and password? I can't find further specification for this problem exactly, and any suggestion or tips are welcome.
nevermind... I wasn't passing the LIST of parameters.... with the following changes it now works:
jaydebeapi.connect(
'com.ibm.db2.jcc.DB2Driver',
['jdbc:db2://server:port/database','myusername','mypassword']
)

Categories