query Kerberized Hive with SQL Alchemy - python

I'm trying to query a Kerberized Hive cluster with SQL Alchemy. I'm able to submit queries using pyhs2 which confirms that it's possible to connect and query Hive when authenticated by Kerberos:
import pyhs2
with pyhs2.connect(host='hadoop01.woolford.io',
port=10500,
authMechanism='KERBEROS') as conn:
with conn.cursor() as cur:
cur.execute('SELECT * FROM default.mytable')
records = cur.fetchall()
# etc ...
I notice that Airbnb's Airflow uses SQL Alchemy and can connect to Kerberized Hive and so I imagine it's possible to do something like this:
engine = create_engine('hive://hadoop01.woolford.io:10500/default', connect_args={'?': '?'})
connection = engine.connect()
connection.execute("SELECT * FROM default.mytable")
# etc ...
I'm not sure what parameters should be set in the connect_args dictionary. Can you see what needs to be added to make this work (e.g. Kerberos service name, realm, etc.)?
update:
Under the hood SQL Alchemy is using PyHive to connect to Hive. The current version of PyHive, v0.2.1, doesn't support Kerberos.
I notice that someone from Yahoo created a pull request that provides support for Kerberos. This PR has not yet been merged/released and so I just copied the code from the PR into /usr/lib/python2.7/site-packages/pyhive/hive.py on the Superset server created a connection like this:
engine = create_engine('hive://hadoop01:10500', connect_args={'auth': 'KERBEROS', 'kerberos_service_name': 'hive'})
Hopefully, the maintainer of PyHive will merge/release the support for Kerberos.

install these libraries
sasl
thrift
thrift-sasl
PyHive
get your kerberos ticket and then;
engine = create_engine('hive://HOST:10500/DB_NAME',
connect_args={'auth': 'KERBEROS', 'kerberos_service_name': 'hive'})
ps: /DB_NAME is optional

Related

How to connect Cloud SQL Server via external python program?

So I am trying to communicate to a Google Cloud SQL Server that I have created with an external python program that I have written in VS Code but I don't know where to begin. Any help will be useful.
I'd recommend using the Cloud SQL Python Connector to manage your connections to Cloud SQL. It supports the pytds driver and should help resolve your troubles for connecting to a SQL Server instance from a Python application.
from google.cloud.sql.connector import connector
import sqlalchemy
# configure Cloud SQL Python Connector properties
def getconn() ->:
conn = connector.connect(
"PROJECT:REGION:INSTANCE",
"pytds",
user="YOUR_USER",
password="YOUR_PASSWORD",
db="YOUR_DB"
)
return conn
# create connection pool to re-use connections
pool = sqlalchemy.create_engine(
"mssql+pytds://localhost",
creator=getconn,
)
# query or insert into Cloud SQL database
with pool.connect() as db_conn:
# query database
result = db_conn.execute("SELECT * from my_table").fetchall()
# Do something with the results
for row in result:
print(row)
For more detailed examples and additional params refer to the README of the repository.
I think you can be inspired by this :Python django
"Run the app on your local computer"

SQL Connect error in Python(Windows): severity 9:\nAdaptive Server connection failed

Not able to connect to Azure DB. I get the following error while connecting via Python.
I'm able to connect to my usual SQL environment
import pandas as pd
import pymssql
connPDW = pymssql.connect(host=r'dwprd01.database.windows.net', user=r'internal\admaaron',password='',database='')
connPDW.autocommit(True)
cursor = connPDW.cursor()
conn.autocommit(True)
cursor = conn.cursor()
sql = """
select Top (10) * from TableName
"""
cursor.execute(sql);
Run without errors.
Just according to your code, there is an obvious issue of connecting Azure SQL Database by pymssql package in Python which use the incorrect user format and lack of the values of password and database parameters.
Please follow the offical document Step 3: Proof of concept connecting to SQL using pymssql carefully to change your code correctly.
If you have an instance of Azure SQL Database with the connection string of ODBC, such as Driver={ODBC Driver 13 for SQL Server};Server=tcp:<your hostname>.database.windows.net,1433;Database=<your database name>;Uid=<username>#<host>;Pwd=<your_password>;Encrypt=yes;TrustServerCertificate=no;Connection Timeout=30; show in the Connection strings tab of your SQL Database on Azure portal.
Then, your code should be like below
hostname = '<your hostname>'
server = f"{hostname}.database.windows.net"
username = '<your username>'
user = f"{username}#{hostname}"
password = '<your password>'
database = '<your database name>'
import pymssql
conn = pymssql.connect(server=server, user=user, password=password, database=database)
Meanwhile, just additional note for the version of Azure SQL Database and MS SQL Server are 2008+ like the latest Azure SQL Database, you should use the ODBC Driver connection string which be started with DRIVER={ODBC Driver 17 for SQL Server};, not 13 show in the connection string of Azure portal if using ODBC driver for Python with pyodbc, please refer to the offical document Step 3: Proof of concept connecting to SQL using pyodbc.

How to connect to a SQL server linked server in Python

Normally, when trying to connect to a SQL sever DB in Python, I use the pyodbc package like this:
import pyodbc
conn = pyodbc.connect("Driver={SQL Server};"
"Server=<server-ip>;"
"Database=<DB-name>;"
"UID=<user-name>;"
"PWD=<password>;"
"Trusted_Connection=yes;"
)
However, I don't know how to connect to a linked server in Python. If my linked server is called linked-server and has a DB called linked-DB for example; I have tried the same connection string as above, and changing the database name like this: "Database=<linked-server>.<linked-DB>;", since that's how I query the linked server DB in SSMS. But this doesn't work in Python.
Thank you very much for your help.

Connect to Database on local host with sqlite3 in python

I have a database that I am running on my local machine which I can access through Microsoft SQL Server Manager Studio. I connect to this server "JIMS-LAPTOP\SQLEXPRESS" and then I can run queries through the manager. However I need to be able to connect to this database and work with it through python.
When I try to connect using sqlite3 like
conn = sqlite3.connect("JIMS-LAPTOP\SQLEXPRESS")
I get an unable to open database file error
I tried accessing the temporary file directly like this
conn = sqlite3.connect("C:\Users\Jim Notaro\AppData\Local\Temp\~vs13A7.sql")
c = conn.cursor()
c.execute("SELECT name FROM sqlite_master WHERE type = \"table\"")
print c.fetchall()
Which allows me to access a database but it is completely empty (No tables are displayed)
I also tried connecting like this
conn = sqlite3.connect("SQL SERVER (SQLEXPRESS)")
Which is what the name is in the sql server configuration manager but that also returns a blank database.
I'm not sure how I am suppose to be connecting to the database using python
You can't use sqlite3 to connect to SQL server, only to Sqlite databases.
You need to use a driver that can talk to MS SQL, like pyodbc.

Connecting to SQL Server 2012 using sqlalchemy and pyodbc

I'm trying to connect to a SQL Server 2012 database using SQLAlchemy (with pyodbc) on Python 3.3 (Windows 7-64-bit). I am able to connect using straight pyodbc but have been unsuccessful at connecting using SQLAlchemy. I have dsn file setup for the database access.
I successfully connect using straight pyodbc like this:
con = pyodbc.connect('FILEDSN=c:\\users\\me\\mydbserver.dsn')
For sqlalchemy I have tried:
import sqlalchemy as sa
engine = sa.create_engine('mssql+pyodbc://c/users/me/mydbserver.dsn/mydbname')
The create_engine method doesn't actually set up the connection and succeeds, but
iIf I try something that causes sqlalchemy to actually setup the connection (like engine.table_names()), it takes a while but then returns this error:
DBAPIError: (Error) ('08001', '[08001] [Microsoft][ODBC SQL Server Driver][DBNETLIB]SQL Server does not exist or access denied. (17) (SQLDriverConnect)') None None
I'm not sure where thing are going wrong are how to see what connection string is actually being passed to pyodbc by sqlalchemy. I have successfully using the same sqlalchemy classes with SQLite and MySQL.
The file-based DSN string is being interpreted by SQLAlchemy as server name = c, database name = users.
I prefer connecting without using DSNs, it's one less configuration task to deal with during code migrations.
This syntax works using Windows Authentication:
engine = sa.create_engine('mssql+pyodbc://server/database')
Or with SQL Authentication:
engine = sa.create_engine('mssql+pyodbc://user:password#server/database')
SQLAlchemy has a thorough explanation of the different connection string options here.
In Python 3 you can use function quote_plus from module urllib.parse to create parameters for connection:
import urllib
params = urllib.parse.quote_plus("DRIVER={SQL Server Native Client 11.0};"
"SERVER=dagger;"
"DATABASE=test;"
"UID=user;"
"PWD=password")
engine = sa.create_engine("mssql+pyodbc:///?odbc_connect={}".format(params))
In order to use Windows Authentication, you want to use Trusted_Connection as parameter:
params = urllib.parse.quote_plus("DRIVER={SQL Server Native Client 11.0};"
"SERVER=dagger;"
"DATABASE=test;"
"Trusted_Connection=yes")
In Python 2 you should use function quote_plus from library urllib instead:
params = urllib.quote_plus("DRIVER={SQL Server Native Client 11.0};"
"SERVER=dagger;"
"DATABASE=test;"
"UID=user;"
"PWD=password")
I have an update info about the connection to MSSQL Server without using DSNs and using Windows Authentication. In my example I have next options:
My local server name is "(localdb)\ProjectsV12". Local server name I see from database properties (I am using Windows 10 / Visual Studio 2015).
My db name is "MainTest1"
engine = create_engine('mssql+pyodbc://(localdb)\ProjectsV12/MainTest1?driver=SQL+Server+Native+Client+11.0', echo=True)
It is needed to specify driver in connection.
You may find your client version in:
control panel>Systems and Security>Administrative Tools.>ODBC Data
Sources>System DSN tab>Add
Look on SQL Native client version from the list.
Just want to add some latest information here:
If you are connecting using DSN connections:
engine = create_engine("mssql+pyodbc://USERNAME:PASSWORD#SOME_DSN")
If you are connecting using Hostname connections:
engine = create_engine("mssql+pyodbc://USERNAME:PASSWORD#HOST_IP:PORT/DATABASENAME?driver=SQL+Server+Native+Client+11.0")
For more details, please refer to the "Official Document"
import pyodbc
import sqlalchemy as sa
engine = sa.create_engine('mssql+pyodbc://ServerName/DatabaseName?driver=SQL+Server+Native+Client+11.0',echo = True)
This works with Windows Authentication.
I did different and worked like a charm.
First you import the library:
import pandas as pd
from sqlalchemy import create_engine
import pyodbc
Create a function to create the engine
def mssql_engine(user = os.getenv('user'), password = os.getenv('password')
,host = os.getenv('SERVER_ADDRESS'),db = os.getenv('DATABASE')):
engine = create_engine(f'mssql+pyodbc://{user}:{password}#{host}/{db}?driver=SQL+Server')
return engine
Create a variable with your query
query = 'SELECT * FROM [Orders]'
Execute the Pandas command to create a Dataframe from a MSSQL Table
df = pd.read_sql(query, mssql_engine())

Categories