Automating Hive with python

Automating Hive with python - python

I am running hive 0.12, and I'd like to run several queries and get the result back as a python array.
for example:
result=[]
for col in columns:
sql='select {c} as cat,count(*) as cnt from {t} group by {c} having cnt > 100;'.format(t=table,c=col)
result.append(hive.query(sql))
result=dict(result)
What I'm missing, is the hive class to run SQL queries.
How can this be done ?

One quick and dirty way to do this, is to automate hive from the command line
hive -e "sql command"
Something like this should work
def query(self,cmd):
"""Run a hive expression"""
cmd='hive -e "'+cmd+'"';
prc = subprocess.Popen(cmd, stdout=subprocess.PIPE,stderr=subprocess.PIPE, shell=True)
ret=stdout.split('\n')
ret=[r for r in ret if len(r)]
if (len(ret)==0):
return []
if (ret[0].find('\t')>0):
return [[t.strip() for t in r.split('\t')] for r in ret]
return ret

You could also access Hive using Thrift. https://cwiki.apache.org/confluence/display/Hive/HiveClient#HiveClient-Python. It looks like pyhs2 is mostly a wrapper around using Thrift directly.

One alternative is to use the pyhs2 library to open a connection to Hive natively from within a Python process. The following is some sample code I had cobbled together to test a different use case, but it should hopefully illustrate use of this library.
# Python 2.7
import pyhs2
from pyhs2.error import Pyhs2Exception
hql = "SELECT * FROM my_table"
with pyhs2.connect(
host='localhost', port=10000, authMechanism="PLAIN", user="root" database="default"
# Use your own credentials and connection info here of course
) as db:
with db.cursor() as cursor:
try:
print "Trying default database"
cursor.execute(hql)
for row in cursor.fetch(): print row
except Pyhs2Exception as error:
print(str(error))
Depending on what is or is not already installed on your box, you may need to also install the development headers for both libpython and libsasl2.

Related

How to connect to snowflake using multiple SQL code in Python vscode?

I am trying to connect to the snowflake database using Python. I have the .sql file in VS code that contains multiple SQL statements. For example:
select * from table1;
select * from table2:
select * from table3:
So, I tried this code to get the result but it returned an error:
"Multiple SQL statements in a single API call are not supported; use one API call per statement instead."
My Python code is
#!/usr/bin/env python
import snowflake.connector
# Gets the version
ctx = snowflake.connector.connect(
user='<user_name>',
password='<password>',
account='<account_identifier>'
)
cs = ctx.cursor()
try:
with open('<file_directory>') as f:
lines = f.readlines()
cs.execute(lines)
data_frame=cs.fetch_pandas_all()
data_frame.to_csv('filename.csv')
finally:
cs.close()
ctx.close()
What can I try next?

Perhaps do as the error suggest, and limit each API call to a single SQL statement?
dfs = []
for line in lines:
cs.execute(lines)
dfs.append(cs.fetch_pandas_all())
df = pd.concat(dfs).to_csv('filename.csv')

Export data from Oracle Database 12c using Python 2.7

I'm trying to export a table, contained within an Oracle 12c database, to csv format - using Python 2.7. The code I have written is shown below:
import os
import cx_Oracle
import csv
SQL = 'SELECT * FROM ORACLE_TABLE'
filename = 'C:\Temp\Python\Output.csv'
file = open(filename, 'w')
output = csv.writer(file, dialect='excel')
connection = cx_Oracle.connect('username/password#connection_name')
cursor = connection.cursor()
cursor.execute(SQL)
for i in cursor:
output.writerow(i)
cursor.close()
connection.close()
file.close()
This code yields an error in the line where I define 'connection':
ORA-12557: TNS:protocol adapter not loadable
How can I remedy this? Any help would be appreciated.
Please note: I have already encountered StackOverflow responses to very similar problems to this. However, they often suggest changing the path within environment variables - I cannot do this since I don't have appropriate administer privileges. Thanks again for your assistance.

ORA-12557 is caused by problems with the %ORACLE_HOME% on Windows. That's the usual suggestion is to change the PATH setting.
"I cannot do this since I don't have appropriate administer privileges."
In which case you don't have too many options. Perhaps you could navigate to the ORACLE_HOME directory and run your script from there. Otherwise look to see what other tools you have available: Oracle SQL Developer? TOAD? SQL*Plus?

We found that by navigating to config -> Oracle and editing the file 'tnsnames.ora' the problem can be solved. The tnsnames file appears as follows:
connection_name =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS= ... )
)
(CONNECT_DATA =
(SERVICE_NAME= ...)
)
)
By changing the first instance of connection_name to connection_name.WORLD, then typing
set ORACLE_HOME=
into the command line before executing the Python script, the above script now runs with no error.

I use ini-file to store DB connection parameters. Hope it helps.
self.mydsn = cx_Oracle.makedsn(self.parser.get('oracle', 'db'),self.parser.get('oracle', 'port'),self.parser.get('oracle', 'service_name'))
try:
self.connpool = cx_Oracle.SessionPool(user=self.parser.get('oracle', 'username'),password=self.parser.get('oracle', 'userpass'),dsn=self.mydsn,min=1,max=5,increment=1)
except Exception as e:
print e

You can use this python script for oracle csv export:
https://github.com/teopost/csv_exp

MS Analysis Services OLAP API - Execute MDX query [duplicate]

I was able to connect to SQL server Analysis service in Python using Microsoft.AnalysisServices.dll, and now I can't execute query on cube.
I've tried Execute method same as following:
amoServer.Execute('select from finance')
After issuing Execute method I have this error:
<Microsoft.AnalysisServices.XmlaError object at 0x000000000000002B [Microsoft.AnalysisServices.XmlaError]>
Note: I'm using IronPython with Python 2.7 on Windows Server 64Bit.
What's the problem?

its better use Microsoft.AnalysisServices.AdomdClient.dll and mdx query.
and set query result in Datasets in Ststem.Data assembly
something like this:
clr.AddReference ("Microsoft.AnalysisServices.AdomdClient.dll")
clr.AddReference ("System.Data")
from Microsoft.AnalysisServices.AdomdClient import AdomdConnection , AdomdDataAdapter
from System.Data import DataSet
conn = AdomdConnection("Data Source=0.0.0.0;Catalog=MyCatalog;")
conn.Open()
cmd = conn.CreateCommand()
cmd.CommandText = "your mdx query" # in your case 'select from finance'
adp = AdomdDataAdapter(cmd)
datasetParam = DataSet()
adp.Fill(datasetParam)
conn.Close();
# datasetParam hold your result as collection a\of tables
# each tables has rows
# and each row has columns
print datasetParam.Tables[0].Rows[0][0]

How I can make same results on cx_Oracle and Sql * Plus?

It's - for sqlplus - commands:
SQL> set serveroutput on
SQL> exec where.my_package.ger_result('something');
something=1823655138
And it's - for cx_Oracle:
>>> c.callproc('where.my_package.ger_result', ('something',))
['something']
As You can see - the results are different.
I have no idea, how to fix it. :[

import cx_Oracle
dsn_tns = cx_Oracle.makedsn('my_ip_address_server_next_port', 0000, 'sid')
db = cx_Oracle.connect('user', 'password', dsn_tns)
curs = db.cursor()
curs.callproc("dbms_output.enable")
curs.callproc('where.my_package.ger_result', ['something',])
statusVar = curs.var(cx_Oracle.NUMBER)
lineVar = curs.var(cx_Oracle.STRING)
while True:
curs.callproc("dbms_output.get_line", (lineVar, statusVar))
if statusVar.getvalue() != 0:
break
print lineVar.getvalue()

Sorry, I can't reproduce this one.
I don't have your PL/SQL package, so I used the following stored procedure instead:
CREATE OR REPLACE PROCEDURE p_do_somet (
p_param IN VARCHAR2
) AS
BEGIN
dbms_output.put_line(p_param || '=1823655138');
END;
/
I got the same output, something=1823655138, from SQL*Plus and from using the Python script in your answer.
If you're getting different results using SQL*Plus and cx_Oracle, then either your stored procedure is doing something very funny (I don't know what could cause it to do this), or your SQL*Plus session and Python script are not connecting to the same database and/or schema.

To insert to Pg by Psycopg

How can you fix the SQL-statement in Python?
The db connection works. However, cur.execute returns none which is false.
My code
import os, pg, sys, re, psycopg2
try:
conn = psycopg2.connect("dbname='tk' host='localhost' port='5432' user='naa' password='123'")
except: print "unable to connect to db"
cur = conn.cursor()
print cur.execute("SELECT * FROM courses") # problem here
The SQL-command in Psql returns me the correct output.
I can similarly run INSERT in Psql, but not by Python's scripts.
I get no warning/error to /var/log.
Possible bugs are
cursor(), seems to be right however
the syntax of the method connect(), seems to be ok however

You have to call one of the fetch methods on cur (fetchone, fetchmany, fetchall) to actually get the results of the query.
You should probably have a read through the a tutorial for DB-API.

You have to call cur.fetchall() method (or one of other fetch*() methods) to get results from query.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Automating Hive with python - python

You could also access Hive using Thrift. https://cwiki.apache.org/confluence/display/Hive/HiveClient#HiveClient-Python. It looks like pyhs2 is mostly a wrapper around using Thrift directly.

Related

How to connect to snowflake using multiple SQL code in Python vscode?

Export data from Oracle Database 12c using Python 2.7

MS Analysis Services OLAP API - Execute MDX query [duplicate]

How I can make same results on cx_Oracle and Sql * Plus?

To insert to Pg by Psycopg

Categories

Resources