Python Connection to Hive - python

I installed the Hortonworks Hive ODBC driver and created a connection in the Data sources. I tested it and it worked successfully.
I installed PyODBC and wrote the following code
import os, sys, pyodbc;
con = pyodbc.connect("DSN=MyCon")
I got error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
pyodbc.Error: ('HYC00', '[HYC00] [Hortonworks][ODBC] (11470) Transactions are not supported. (11470) (SQLSetConnnectAttr(SQL_ATTR_AUTOCOMMIT))')
I also tried
import pyodbc, sys, os
pyodbc.pooling = False
pyodbc.autocommit = False
con = pyodbc.connect("DSN=MyCon")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
pyodbc.Error: ('HYC00', '[HYC00] [Hortonworks][ODBC] (11470) Transactions are not supported. (11470) (SQLSetConnnectAttr(SQL_ATTR_AUTOCOMMIT))')
also tried
con = pyodbc.connect("DSN=Tenet", autocommit=False)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
pyodbc.Error: ('HYC00', '[HYC00] [Hortonworks][ODBC] (11470) Transactions are not supported. (11470) (SQLSetConnnectAttr(SQL_ATTR_AUTOCOMMIT))')

I solved it..... I am not deleting my question and putting the answer here
pyodbc.autocommit = True
con = pyodbc.connect("DSN=MyCon", autocommit=True)
This was done based on advice of this read
https://code.google.com/p/pyodbc/issues/detail?id=162
** thanks to the advice from Kyle Porter below... it totally makes sense now **

Related

How to integrate and mocking Redshift and S3 locally using redshift-fake-driver

I would like to build redshift and s3 locally, and then use them for tasks that may run from airflow, tools ... to reduce CI/CD code when have to deploy them to dev, also want to avoid conflict about resources, files, ...
Currently can use LocalStack's S3, but for Redshift, jusr looking for solutions but only get combination using redshift-fake-driver along with package JayDeBeApi in python, but it seems not working properly
import jpype # JPype1==1.4.1
import jaydebeapi # JayDeBeApi==1.2.3
jars = "/Users/trancongminh/Downloads/jars/*"
jpype.startJVM(classpath=jars)
driverName = "jp.ne.opt.redshiftfake.postgres.FakePostgresqlDriver"
print(jpype.JClass(driverName))
# as I spin up a docker container for postgresQL
connectionString = "jdbc:postgresqlredshift://localhost:5432/docker"
uid = "docker"
pwd = "docker"
driverFileName = "/Users/trancongminh/Downloads/jars/redshift-fake-driver_2.12-1.0.15.jar"
conn = jaydebeapi.connect(
jclassname=driverName,
url=connectionString,
driver_args={'user': uid, 'password': pwd},
jars=driverFileName
)
curs = conn.cursor()
curs.execute("SELECT * FROM pg_catalog.pg_tables limit 10;")
curs.fetchall()
curs.execute("copy db_table_name_v2 from 'http://localhost:4566/events-streaming/traveller/v2/ym_202210/d_04/hm_131901.parquet' CREDENTIALS 'aws_access_key_id=test;aws_secret_access_key=test' ")
But get errors like No such file or directory, or smth like this
Traceback (most recent call last):
File "FakeConnection.scala", line 31, in jp.ne.opt.redshiftfake.FakeConnection.prepareStatement
Exception: Java Exception
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/trancongminh/Pelago/pelago-ds-env/lib/python3.9/site-packages/jaydebeapi/__init__.py", line 531, in execute
self._prep = self._connection.jconn.prepareStatement(operation)
java.lang.NoSuchMethodError: java.lang.NoSuchMethodError: 'void scala.util.parsing.combinator.Parsers.$init$(scala.util.parsing.combinator.Parsers)'
or may be like this:
Traceback (most recent call last):
File "FakePreparedStatement.scala", line 138, in jp.ne.opt.redshiftfake.FakePreparedStatement$FakeAsIsPreparedStatement.execute
Exception: Java Exception
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/trancongminh/Pelago/pelago-ds-env/lib/python3.9/site-packages/jaydebeapi/__init__.py", line 534, in execute
is_rs = self._prep.execute()
org.postgresql.util.PSQLException: org.postgresql.util.PSQLException: ERROR: could not open file "s3://events-streaming/traveller/v2/ym_202210/d_04/hm_131901.parquet" for reading: No such file or directory
Hint: COPY FROM instructs the PostgreSQL server process to read a file. You may want a client-side facility such as psql's \copy.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/trancongminh/Pelago/pelago-ds-env/lib/python3.9/site-packages/jaydebeapi/__init__.py", line 536, in execute
_handle_sql_exception()
File "/Users/trancongminh/Pelago/pelago-ds-env/lib/python3.9/site-packages/jaydebeapi/__init__.py", line 165, in _handle_sql_exception_jpype
reraise(exc_type, exc_info[1], exc_info[2])
File "/Users/trancongminh/Pelago/pelago-ds-env/lib/python3.9/site-packages/jaydebeapi/__init__.py", line 57, in reraise
raise value.with_traceback(tb)
File "/Users/trancongminh/Pelago/pelago-ds-env/lib/python3.9/site-packages/jaydebeapi/__init__.py", line 534, in execute
is_rs = self._prep.execute()
jaydebeapi.DatabaseError: org.postgresql.util.PSQLException: ERROR: could not open file "s3://events-streaming/traveller/v2/ym_202210/d_04/hm_131901.parquet" for reading: No such file or directory
Hint: COPY FROM instructs the PostgreSQL server process to read a file. You may want a client-side facility such as psql's \copy
Anyy body has exp with this pattern just help, thanks
Solutions or keywords that helpful for further investigation

How to fix the error 'TypeError: can't pickle time objects'?

I am using the OpenOPC library to read data from an OPC Server, I am using 'Matrikon OPC Simulation Server', when I try to read the data it sends me the following error:
TypeError: can't pickle time objects
The code I use is the following, I run it from the python console.
CODE:
import OpenOPC
opc = OpenOPC.client()
opc.connect('Matrikon.OPC.Simulation')
opc.read('Random.Int4')
When I run the line opc.read ('Random.Int4'), that's when the error appears.
This is how the variable appears in my MatrikonOPC Explorer:
This is the complete error:
Traceback (most recent call last):
File "C:\Python27\Lib\multiprocessing\queues.py", line 264, in _feed
send(obj)
TypeError: can't pickle time objects
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "C:\Users\User\PycharmProjects\OPC2\venv\lib\site-packages\OpenOPC.py", line 625, in read
return list(results)
File "C:\Users\User\PycharmProjects\OPC2\venv\lib\site-packages\OpenOPC.py", line 543, in iread
raise TimeoutError('Callback: Timeout waiting for data')
TimeoutError: Callback: Timeout waiting for data
I solved this issue by adding sync=True when calling opc.read()
CODE:
import OpenOPC
opc = OpenOPC.client()
opc.connect('Matrikon.OPC.Simulation')
opc.read('Random.Int4', sync=True)
Reference: mkwiatkowski/openopc

MySQL and TypeError: not enough arguments for format string

I have a trouble with my program in my raspberry pi. This is my source code
import MySQLdb
import csv
db=MySQLdb.connect(user='root',passwd='toor',
host='127.0.0.1',db='data')
cursor=db.cursor()
csv_data=csv.reader(file('datasensor.txt'))
for row in csv_data:
sql = "insert into `kelembapan` (`id`,`Tanggal`,`Tipe_sensor`,`Value`,`Ket`) values(%s,%s,%s,%s,%s);"
cursor.execute(sql,row)
db.commit()
cursor.close()
print "The Data has been inputted"
and this is for txt file
1, 2017-10-10, sensor1,40,Kurang lembap
2, 2017-10-10, sensor2,60,Lembap
That program can run in my ubuntu but not in my raspberry. when run in raspberry there is error
Traceback (most recent call last):
File "server.py", line 9, in <module>
cursor.execute(sql,row)
File "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py",line 159, in execute
query = query% db.literal(args)
TypeError: not enough arguments for format strings
Thnks before :)

How to change command timeout in pywin32. 'Open' method

I've got the problem using pywin32 library and trying to connect to OLEDB.
Traceback
Traceback (most recent call last):
File "<input>", line 35, in <module>
File "<input>", line 31, in ado
File "<COMObject ADODB.Recordset>", line 4, in Open
the XML parser for analysis: the response Time for the XML for analysis request timed out before it was completed.', None, 0, -2147467259), None)
I've tried to add Connect Timeout=1000 to my connectionstring to no avail.
Code
import win32com.client
import pyodbc
conn = win32com.client.Dispatch(r'ADODB.Connection')
DSN = CONNECTION_STRING
conn.Open(DSN)
rs = win32com.client.Dispatch(r'ADODB.Recordset')
strsql = u"""
select
...
...
...
"""
h = rs.Open(strsql, conn,0,1)
ts = rs.GetRows()
conn.Close()
return ts
I think that the problem is here:
h = rs.Open(strsql, conn,0,1)
I can't see which parameters should be passed to 'Open'. But I think it must have timeout parameter.
How can I change command timeout?
The problem is solved by adding:
conn.CommandTimeout=3000

ERROR_STATE'. (35) (SQLExecDirectW)

Here is some python code which is being executed against a HIVE database
pyodbc.autocommit = True
con = pyodbc.connect("DSN=MyCon", autocommit=True)
cursor = con.cursor()
cursor.execute("select name, surname from foo f inner join bar b on f.id = b.id")
Error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
pyodbc.Error: ('HY000', "[HY000] [Hortonworks][HiveODBC] (35) Error from
Hive: error code: '0' error message: 'ExecuteStatement finished with operation
state: ERROR_STATE'. (35) (SQLExecDirectW)")
I solved it. when creating the ODBC connection use the user hdfs. I had read a tutorial and was using the user hue.
this caused the problem.

Categories