Connect to Impala using impyla client with Kerberos auth - python

I'm on a W8 machine, where I use Python (Anaconda distribution) to connect to Impala in our Hadoop cluster using the Impyla package. Our hadoop cluster is secured via Kerberos. I have followed the API REFERENCE how to configure the connection.
from impala.dbapi import connect
conn = connect( host='localhost', port=21050, auth_mechanism='GSSAPI',
kerberos_service_name='impala')
We are using Kerberos GSSAPI with SASL
auth_mechanism='GSSAPI'
I have managed to install python-sasl library for WIN8 but still I encounter this error.
Could not start SASL: Error in sasl_client_start (-4) SASL(-4): no mechanism available: No worthy mechs found (code THRIFTTRANSPORT): TTransportException('Could not start SASL: Error in sasl_client_start (-4) SASL(-4): no mechanism available: No worthy mechs found',)
I wonder if I am still missing some dependencies.

Install the kerberos Python package, it will fix your issue.

I ran into the same issue but i fixed it by installing the right version of required libraries.
Install below python libraries using pip:
six==1.12.0
bit_array==0.1.0
thrift==0.9.3
thrift_sasl==0.2.1
sasl==0.2.1
impyla==0.13.8
Below code is working fine with the python version 2.7 and 3.4.
import ssl
from impala.dbapi import connect
import os
os.system("kinit")
conn = connect(host='hostname.io', port=21050, use_ssl=True, database='default', user='urusername', kerberos_service_name='impala', auth_mechanism = 'GSSAPI')
cur = conn.cursor()
cur.execute('SHOW DATABASES;')
result=cur.fetchall()
for data in result:
print (data)

Try this to get tables for kerberized cluster. In my case CDH-5.14.2-1.
Make sure you have a valid ticket before running this code.
with python 2.7 having below packages.
thrift-0.9.3
thriftpy-0.3.8
thrift_sasl-0.3.0
impyla==0.14.2.2
Working Code
from impala.dbapi import connect
from impala.util import as_pandas
# 21000 is impala daemon port.
conn = connect(host='yourHost', port=21050, auth_mechanism='GSSAPI')
cursor = conn.cursor()
cursor.execute("SHOW TABLES")
# After running .execute(), Impala will store the result sets on the server
# until it is fetched. Use the method .fetchall() to pull the entire result
# set over the network (you should only do it if you know dataset is small)
tables = cursor.fetchall()
print("Displaying list of tables")
# the result is a list of tuples
for t in tables:
# we know that each row in SHOW TABLES result
# should only contains one table name
print(t[0])
# exit() enable for only one table
print("eol >>>")

For me, installing this package fixed it: libsasl2-modules-gssapi-mit

For me, the following connection parameters worked. I did not have to install any additional packages in python.
connect(host="your_host", port=21050, auth_mechanism='GSSAPI', timeout=100000, use_ssl=False, ca_cert=None, ldap_user=None, ldap_password=None, kerberos_service_name='impala')

To connection Impala using python you can follow below steps,
Install Coludera ODBC Driver for Impala.
Create DSN using 64-bit ODBC driver, put your server details, below is sample screen shot for same
Use below code snippet for connectivity
import pyodbc
with pyodbc.connect("DSN=impala_con", autocommit=True) as conn:
... df = pd.read_sql("", conn)

python cannot connect hiveserver2
make sure you install cyrus-sasl-devel and cyrus-sasl-gssapi

Related

Connecting R to Oracle DB without admin

I need an R script that allows me to connect to an Oracle DB without having to install anything needing admin powers, and preferrably nothing at all apart from package downloads. In python the following code works, I believe because it uses the cx_Oracle module as a portable driver. What would be a good R alternative?
import pandas as pd
import sqlalchemy
import sys
host = "xxx.intra"
database = "mydb"
user = "usr"
password = "pw"
def get_oracle_engine(host, database, user, password):
return sqlalchemy.create_engine("oracle+cx_oracle://{user}:{password}#{host}:1521/?service_name={database}".format(host=host, database=database, user=user, password=password))
engine=get_oracle_engine(host, database, user, password)
pd.read_sql_table("mytable", engine, schema= mydb,index.cols="id1")
I managed to install ROracle using the CRAN instructions but I keep getting the ORA-12154 TNS: cound not resolve the connect identifier specified when using:
library(ROracle)
con= DBI::dbconnect(dbDriver("Oracle"), user= user, password=password, host=host, dbname=database, port="1521")
By the way dbDriver("Oracle") returns
Driver name : Oracle (OCI)
Driver version: 1.3-1
Client version: 12.1.0.2.0
Try code like:
library(DBI)
library(ROracle)
drv <- Oracle()
con <- dbConnect(drv, 'cj', 'welcome', 'localhost:1521/orclpdb1')
dbGetQuery(con,"select count(*) from dual")
The connect string components are related to the {host}:1521/?service_name values you used with SQLAlchemy. Use a TNS alias or Easy Connect String, the same as other C based Oracle drivers, e.g. https://cx-oracle.readthedocs.io/en/latest/user_guide/connection_handling.html#connection-strings
The current ROracle code is at https://www.oracle.com/database/technologies/roracle-downloads.html There are some packaging glitches with uploading to CRAN and the CRAN maintainers haven't been responsive about resolving them.
ROracle still needs Oracle Client libraries such as from Oracle Instant Client.

How to check connection from Python to Neo4j

I'm doing a microservice in Python 3.7 that connects to a Neo4j database. It's the first time I work connecting Python with Neo4j and I'm using py2neo version 4.3.0.
Everything works OK, but now to adhere to the standard, I need to create a healthcheck to verify the connection to the Database.
I wanted to use the
from py2neo import Graph, Database
and use
db = Database ("bolt: // localhost: 7474", auth = ("neo4j", "xxxx"))
and
db.kernel_version (Dont work)
but with this I do not verify that there is connection is up. Does anybody have any suggestions?
If checking the kernel version doesn't work then the connection is not ok. Below is a script to check if the connection from python to neo4j (via py2neo) is up and running.
from py2neo import Graph
graph = Graph("bolt://localhost:7687", auth=("neo4j", "xxxxx"))
try:
graph.run("Match () Return 1 Limit 1")
print('ok')
except Exception:
print('not ok')

Sybase IQ connection in Python

I've spent a few days trying to determine how to connect to a Sybase IQ database through Python 3.6. I've tried pyodbc and pymssql, to no avail. Below are two code snippets that I've been working on, which don't seem to work, no matter what I try.
pyodbc:
conn = pyodbc.connect(driver='{SQL Server Native Client 11.0}',
server=server,
database=database,
port=port,
uid=user,
pwd=pwd)
pymssql:
conn = pymssql.connect(server=server,
port=port,
user=user,
password=pwd,
database=database)
I've also read that FreeTds could be the solution for connecting to a Sybase IQ database; I thought it was installed as part of the pymssql database, but I can't seem to figure out how to leverage it. Any help would be greatly appreciated!
EDIT: I am aware that sqlanydb exists; however, this package makes me downgrade to Python 2.7. My stack is 3.6 and I'd like to not have to move off of that.
After some time, I was able to resolve this issue (On Windows). First, install SQL Anywhere 17 driver. Once that's been installed, in the Windows ODBC Data Sources window, set up a connection using the SQL Anywhere 17, and your Sybase IQ credentials. Once that has been configured and successfully tested, you can use the below code snippet to connect:
from sqlalchemy import create_engine
sybase_connection_string = "sqlalchemy_sqlany://{user}:{pwd}#{host}:{port}/{db}".\
format(user=user, pwd=pwd, host=host, port=port, db=database)
engine = create_engine(sybase_connection_string)
return engine.connect()
I believe you will need the sqlalchemy_sqlany module installed via pip, as well as sqlalchemy.
Alternative use jconn4 or jconn3 driver.
Example of connection:
import jaydebeapi
jar_path = "/drive/jconn4.jar"
driver_name = "com.sybase.jdbc4.jdbc.SybDriver"
_ipad = '1.1.1.1'
_port='2638'
con_prop= { "user": 'user', "password": 'pwd'}
connection_url = f"jdbc:sybase:Tds:{_ipad}:{_port}"
conn= jaydebeapi.connect(driver_name, connection_url,con_prop, jar_path)
You can use jconn4.jar to connect to Sybase IQ.
I was able to connect with SAP IQ/16.1.080.1841
To get jconn4.jar, use dbeaver and connect with sybase batabase. Dbeaver will download this jar, which you can use. You can download community edition from official site https://dbeaver.io/
This will require JAVA, to get this running. I used JDK 1.8.0_181
Install jaydebeapi for your python with pip install jaydebeapi.
I used python 3.11.0 and jaydebeapi==1.2.3
Once you have this, connect like below:
import jaydebeapi
jconn4_file_path = '<path/to/jconn4.jar>'
driver = 'com.sybase.jdbc4.jdbc.SybDriver'
db_server = '<server hostname>'
db_port = <port>
db_user = '<database username>'
db_password = '<database password>'
db_name = '<database name>'
connection_string = f'jdbc:sybase:Tds:{db_server}:{db_port}?ServiceName={db_name}'
connection = jaydebeapi.connect(
driver,
connection_string,
[db_user, db_pass],
jconn4_file_path
)

cx_Oracle LDAP Connection String syntax

With JDBC, we can use the following syntax to connect to an Oracle database over an LDAP connection:
jdbc:oracle:thin:#ldap://host:1234/service_name,cn=OracleContext,dc=org,dc=com
How can I connect over LDAP using cx_oracle?
I ended up going with jaydebeapi.
import pandas as pd
import jaydebeapi
import jpype
import os
import sys
def run(f_name,command,username,pw ):
jar='ojdbc8.jar'
args = '-Djava.class.path=%s' % jar
jvm_path = jpype.getDefaultJVMPath()
jpype.startJVM(jvm_path, args)
con = jaydebeapi.connect("oracle.jdbc.driver.OracleDriver", "jdbc:oracle:thin:#ldap://server.prod.company.com:3060/service,cn=OracleContext,dc=prod,dc=company,dc=com",[username, pw], jar)
try:
df= pd.read_sql(command,con)
df.to_excel(f_name)
print(df)
except Exception as e:
print(e)
finally:
con.close()
def Run_Program(myvars):
os.chdir(sys._MEIPASS)
f_name = myvars.MyFileName
command = myvars.plainTextEdit_CSVString.toPlainText()
username = myvars.lineEdit_UserName.text()
pw = myvars.lineEdit_Password.text()
run(f_name,command,username,pw )
Saving the ojdbc8.jar file from Oracle Client in the same folder and specifying the location in the code. And also downgrading the module JPype1 to JPype1==0.6.3 (its installed as a requirement for jaydebeapi )
This worked well for packaging using pyinstaller so that it could be shared. (i created a pyqt5 UI for user to use.
Here are my two cents using Python 3.7 and cx_Oracle v.8.2.0 on Win 10.
I wanted to issue queries to an Oracle database using Python, and what I already had was :
a username (or schema)
a password
a JDBC connection string that looked like:
jdbc:oracle:thin:#ldap://[LDAPHostname1]:[LDAPPort1]/[ServiceName],[DomainContext] ldap://[LDAPHostname2]:[LDAPPort2]/[ServiceName],[DomainContext]
where the [DomainContext] was of the form cn=OracleContext,dc=foo,dc=bar
First, you have to install cx_Oracle by following the Oracle documentation.
Note that:
cx_Oracle requires a series of library files that are part of the Oracle Instant Client "Basic" or "Basic Light" package (available here). Let's say we unzip the package under C:\path\to\instant_client_xx_yy
Depending on the platform you're on, some other requirements are to be filled (like installing some Visual Studio redistributable on Windows)
For the LDAP part, there are two configuration files that are required:
sqlnet.ora : This is the profile configuration file for Oracle, but mine was simply containing :
NAMES.DIRECTORY_PATH = (LDAP)
It tells the library to resolve names using LDAP only.
ldap.ora : This file tells where to look for when resolving names using LDAP. I knew I was accessing two OID servers, so mine was of the form :
DIRECTORY_SERVERS=([LDAPHostname1]:[LDAPPort1], [LDAPHostname2]:[LDAPPort2])
DEFAULT_ADMIN_CONTEXT="dc=foo,dc=bar"
DIRECTORY_SERVER_TYPE=oid
Important Note : I had to remove the cn=OracleContext from the DEFAULT_ADMIN_CONTEXT entry in order to make the name resolution work
Let's say those two files were saved under C:\path\to\conf
And now comes the Python part. I used the cx_Oracle.init_oracle_client() method in order to point to the library and configuration files. (Note that there are other ways to give cx_Oracle access to those files, like setting environment variables or putting those in predefined places. This is explained under the install guide)
Here is a little sample code:
import cx_Oracle
# username and password retrieved here
cx_Oracle.init_oracle_client(lib_dir=r'C:\path\to\instant_client_xx_yy', config_dir=r'C:\path\to\conf')
try:
with cx_Oracle.connect(user=username, password=password, dsn='[ServiceName]') as connection:
cursor = connection.cursor()
cursor.execute('SELECT * FROM ALL_TAB_COLUMNS')
# Outputs tables and columns accessible by the user
for row in cursor:
print(row[1], '-', row[2])
cursor.close()
except cx_Oracle.DatabaseError as e:
print("Oracle Error", e)
The short answer is that you use an ldap.ora configuration file and specify that it is to be used in your sqlnet.ora configuration file. Although this link talks about creating a database link and not directly connecting, the same principle applies and you can connect using any of the services referenced in your LDAP server.
http://technologydribble.info/2015/02/10/how-to-create-an-oracle-database-link-using-ldap-authentication/
Some more official documentation on how it works can be found here:
https://docs.oracle.com/cd/B28359_01/network.111/b28317/ldap.htm

pyodbc.Error 'IM002' connecting to DB2

I downloaded Python 2.7 (python-2.7.1.amd64.msi) and pyodbc, the python extension module for connecting to DB2 database (i.e. pyodbc-2.1.8.win-amd64-py2.7.exe).
I wrote sample script as shown below.
import csv
import pyodbc
conn = pyodbc.connectpyodbc.connect('DRIVER={DB2};SERVER=localhost;DATABASE=DBT1;UID=scott;PWD=tiger;')
curs = conn.cursor()
curs.execute('select count(edokimp_id) from edokimp')
print curs.fetchall()
The script throws following error
pyodbc.Error: ('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnectW)')
As I am a newbie to Python, I realized from the error that I need to download the IBM DB2 driver for pyodbc and hence searched extensively on Google but couldn't find any.
I would greatly appreciate if you could point me to the site where I can download the driver and later explain me how to configure/load the driver.
In case of Java
the driver will be shipped in the form of ojdbc.jar which will be copied to the lib directory which will be on classpath
make changes to configuration file
reference the DataSource from Java Class
I am newbie to Python so I would greatly appreciate if you could let me know cooresponding steps with an example in Python.
You can get the PyDB2 driver on the project homepage.
If you run into compilation issues with the official Python, ActivePython is a good alternate distribution of Python on Windows.
Edit: If it asks you for DB2 headers, you need to get the IBM Data Server Client for ODBC and CLI.
It does work using pyodbc. I think you have a wrong connection string. After some research and tests I solved with this code:
con = pyodbc.connect('DRIVER=iSeries Access ODBC Driver;SYSTEM=10.0.0.1;UID=bubi;PWD=xyz;DBQ=DEFAULTSCHEMA;EXTCOLINFO=1')
cur = con.cursor()
cur.execute('select * from MYTABLE')
row = cur.fetchone()
if row:
field1 = row[0]
field2 = row[1]
# etc...
As you see it doesn't need a DSN to be configured on your system.
This connection string for pyodbc, work for me:
conexion_str = 'SYSTEM=%s;db2:DSN=%s;UID=%s;PWD=%s;DRIVER=%s;' % (self._SYSTEM, self._DSN, self._UID, self._PWD, self._DRIVER)
self._cnn = pyodbc.connect(conexion_str)

Categories