Using jaydebeapi3 to connect to Apache Phoenix - python

I have a program, in which I have been using the phoenixdb package developed by Lukas Lalinsky but during the past few days it seems to have become very unstable. I think this is due to the size of the database (as it is constantly growing). By unstable I mean that around half my queries are failing with a runtime exception.
So I have moved on and tried to find a more stable way to connect with my Phoenix "server". Therefore I want to try out a JDBC connection. As far as I have understood Phoenix should have great integration with JDBC.
I do however have problems with understanding how to set up the initial connection.
I read the following Usage section of the JayDeBeApi package, but I don't know what the Driver Class is or where it is located? If I have to download it myself? How to set it up? And so forth.
I was hoping someone in here would know and hopefully explain it in detail.
Thanks!
EDIT:
I've managed to figure out that my connect statement should be something along this:
import jaybedeapi as jdbc
conn = jdbc.connect('org.apache.phoenix.jdbc.PhoenixDriver', ['jdbc:phoenix:<ip>:<port>:', '', ''], '<location-of-phoenix-client.jar>')
However I still don't know where to get my hands on that phoenix-client.jar file and how to reference to it.

I managed to find the solution after having set up a Java project and testing out JDBC in that development environment and getting a successful connection.
To get the JDBC connection working in Java I used the JDBC driver found in the Phoenix distribution from Apache here. I used the driver that matched my Phoenix and HBase versions - phoenix-4.9.0-HBase-1.2-client.jar
Once that setup was completed and I could connect to Phoenix using Java I started trying to set it up using Python. I started a connection to Phoenix with the following:
import jaydebeapi as jdbc
import os
cwd = os.getcwd()
jar = cwd + '/phoenix-4.9.0-HBase-1.2-client.jar'
drivername = 'org.apache.phoenix.jdbc.PhoenixDriver'
url = 'jdbc:phoenix:<ip>:<port>/'
conn = jdbc.connect(drivername, url, jar)
Now I had a successful connection through JDBC to Phoenix using Python. Hope someone else out there can use this question in the future.
I created a cursor using the following and could issue commands like in the following:
cursor = conn.cursor()
sql = """SELECT ...."""
cursor.execute(sql)
resp = cursor.fetchone() # could use .fetchall() or .fetchmany() if needed
I hope this helps someone out there!

Related

How will setting autocommit = True affect queries from python to Hive server when calling pyodbc.connect()

I am trying to connect a jupyter notebook I'm running in a conda environment to a Hadoop cluster through Apache Hive on cloudera. I understand from this post that I should install/set up the cloudera odbc driver and use pydobc and with a connection as follows:
import pyodbc
import pandas as pd
with pyodbc.connect("DSN=<replace DSN name>", autocommit=True) as conn:
df = pd.read_sql("<Hive Query>", conn)
My question is about the autocommit parameter. I see in the pyodbc connection documentation that setting autocommit to True will make it so that I don't have to explicitly commit transactions, but it doesn't specify what that actually means.
What exactly is a transaction ?
I want to select data from the hive server using pd.read_sql_query() but I don't want to make any changes to the actual data on the server.
Apologies if this question is formatted incorrectly or if there are (seemingly simple) details I'm overlooking in my question - this is my first time posting on stackoverflow and I'm new to working with cloudera / Hive.
I haven't tried connecting yet or running any queries yet because I don't want to mess up anything on the server.
Hive do not have concept of commit and starting transactions like RDBMS systems.
You should not worry about autocommit.

How do you connect to an Oracle ADW cloud database via Python?

I found this website and it suggest that I might be able to connect to my Oracle ADW cloud database using python. I tried running the below code but keep running into the same error. Anyone have any insight on how to resolve this? Note: Password is changed for obvious reasons.
Code in Jupyter Notebooks:
import cx_Oracle as cx
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
pswd = 'ABC'
#Connect to Autonomous Data Warehouse
con = cx.connect(user = 'ADMIN', password = pswd)
query = 'SELECT * from TEST123'
data_train = pd.read_sql(query, con=con)
Error:
DatabaseError: Error while trying to retrieve text for error ORA-01804
I get the same error when I run the below code:
...
#Connect to Autonomous Data Warehouse
con = cx.connect('ADMIN',pswd,"mltest_high")
query = 'SELECT * from TEST123'
data_train = pd.read_sql(query, con=con)
So this took a lot of education in order to figure out especially when it came to how Oracle wallets work inline with SLQNET.ora and TNS_NAMES.ora file in conjunction with system environmental variables but this website did get my python (in .ipynb) in visual studio code to work in being able to connect with Oracle's cloud ADW system. It is almost exactly what I did to get it to work on my machine but I didn't do the virtual environment. I had to figure out a work around with the items stated above but was able to instead use the system link to my wallet for the directory.
It is important to know that you need to do these things to get it to work. When you download the wallet from ADW, you need to copy the high/medium/low lines from the TNS_NAMES and paste that in your Oracle/network/admin/tns_names.ora file. You will also need to take the wallet information and ssl server from the sqlnet.ora file and put it in the sqlnet.ora file in the Oracle/network/admin/ directory as well. If you chose not to use the virtual environment as demonstrated in the post, to get the directory link, for the wallet information line, to work, you'll need to set said directory to the directory of the wallet folder. I unzipped mine; unsure if needed or not.
Lastly, you will need to set your system environmental variables for TNS_NAMES to wherever your tns_names.ora and sqlnet.ora system files are (not the ones that come in the wallet download folder), likely in Oracle\network\admin.
Below is the code that worked for me. I hope this helps someone else and that they don't have to go through the same hoops that I had to in order to figure it out.
import cx_Oracle
import os
import pandas as pd
os.environ.get('TNS_ADMIN')
connection = cx_Oracle.connect('<Oracle ADW Username', '<Oracle ADW Password>', '<TNS_NAME entry (high/med/low)>')
cursor = connection.cursor()
rs = cursor.execute("SELECT * FROM TEST123")
df = pd.DataFrame(rs.fetchall())
df
At a guess from the error number and fact the message text wasn't found, cx_Oracle is using Oracle Instant Client libraries, but you have the ORACLE_HOME environment variable set to some other software. If so, unset ORACLE_HOME. Or perhaps you are only using libraries included in a local Oracle DB install and haven't fully set the Oracle environment variables e.g. haven't set ORACLE_HOME. Or perhaps you might need a more recent version of the Oracle client libraries - get 19c libraries e.g Oracle Instant Client. Also check other StackOverflow questions about ORA-1804. If you update your question with information about what Oracle software you have installed on the computer running Python, a more detailed answer might be possible.
It sounds like you have got the cloud wallet sorted out for connection, but here are references for people coming to this question after reading your heading:
A blog post How to connect to Oracle Autonomous Cloud Databases
cx_Oracle documentation Connecting to Autononmous Databases
Oracle ADW documentation: Connect with Python, Node.js, and other Scripting Languages

Connecting to jTDS Microsoft server with SQLalchemy and Presto

I'm trying to connect to an oldschool jTDS ms server for a variety of different analysis tasks. Firstly just using Python with SQL alchemy, as well as using Tableau and Presto.
Focusing on SQL Alchemy first at the moment I'm getting an error of:
Data source name not found and no default driver specified
With this, based on this thread here Connecting to SQL Server 2012 using sqlalchemy and pyodbc
i.e,
import urllib
params = urllib.parse.quote_plus("DRIVER={FreeTDS};"
"SERVER=x-y.x.com;"
"DATABASE=;"
"UID=user;"
"PWD=password")
engine = sa.create_engine("mssql+pyodbc:///?odbc_connect={FreeTDS}".format(params))
Connecting works fine through Dbeaver, using a jTDS SQL Server (MSSQL) driver (which is labelled as legacy).
Curious as to how to resolve this issue, I'll keep researching away, but would appreciate any help.
I imagine there is an old drive on the internet I need to integrate into SQL Alchemy to begin with, and then perhaps migrating this data to something newer.
Appreciate your time

python apache phoenix jdbc connection

I'm attempting to connect to Phoenix-Hbase via JDBC. Actually I already have a Phoenix connection through DBeaver and I'm trying to replicate it in python. I tried:
import jaydebeapi
conn = jaydebeapi.connect("org.apache.phoenix.jdbc.PhoenixDriver",
'jdbc:phoenix:host_name', r'C:\Users\XXXX\Desktop\phoenix-core-4.5.2-HBase-0.98.jar')
but this returns me
java.lang.RuntimeException: Class org.apache.phoenix.jdbc.PhoenixDriver not found
well, I double-checked Class Name, Phoenix and Hbase versions, but I noticed no anomalies. So I looked at DBeaver Phoenix driver settings which look like this, a very long list of drivers that I really don't know how to get them all at once and eventually how to use them into code

SQLAlchemy hangs while connecting to SQL Azure, but not always

I have a django application, which is making use of SQLAlchemy to connect to a SQL Server instance on Windows Azure. The app has worked perfectly for 3 months on a local SQL Server instance, and for over a month on an Azure instance. The issues appeared this monday, after a week without any code changes.
The site uses:
Python 2.7
Django 1.6
Apache/Nginx
SQLAlchemy 0.9.3
pyODBC 3.0.7
FreeTDS
The application appears to lock up right after a connection is pulled out of the Pool (I have setup verbose logging at every point in the workflow). I assumed this had something to do with the connections going stale. So we tried making the pool_recycle incredibly short (5 secs), all the way up to an hour. That did not help.
We also tried using the NullPool to force a new connection on every page view. However that does not help either. After about 15 minutes the site will completely lock up again (meaning no pages that use the database are viewable).
The weird thing is, half the computers that experience the "hang", will end up loading the page about 15 minutes later.
Has anyone had any experience with SQL Azure and SQLAlchemy?
I found a workaround for this issue. Please note that this is definitely not a fix, since the site worked perfectly fine before. We could not determine what the actual issue is because SQL Azure has no error log (one of the 100 reasons I would suggest never considering SQL Azure over a real database server).
I got around the problem by turning off all Connection Pooling, at the application level, AND at the driver level.
Things started consistently working after making my /etc/odbcinst.ini look like:
[FreeTDS]
Description = TDS driver (Sybase/MS SQL)
# Some installations may differ in the paths
Driver = /usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so
Setup = /usr/lib/x86_64-linux-gnu/odbc/libtdsS.so
CPReuse =
CPTimeout = 0
FileUsage = 1
Pooling = No
The key being setting CPTimeout (Connection Pool Timeout) to 0, and Pooling to No. Just turning pooling off at the application level (in SQL Alchemy) did not work, only after setting it at the driver level did things start working smoothly.
I am now at 4 days without a problem after changing that setting.

Categories