how to read from Kudu to python - python

I am trying to retrieve data from Kudu. But I am not able to install kudu-python package in anaconda or my server. Can I get some help with it? The documentation on the internet is not really clear.

#Karthik, did you encounter any errors? I just installed kudu-python client on Anaconda on Centos 6.9. There was one gotcha with versioning, but otherwise it was straightforward. The only error I ran into was
kudu/client.cpp:589:30: fatal error: kudu/util/int128.h: No such file or directory
there is a solution for it here: https://community.cloudera.com/t5/Data-Ingestion-Integration/can-not-install-kudu-python/td-p/67496
Otherwise, the steps are:
1. Install kudu client libraries as described on Kudu website (https://kudu.apache.org/docs/installation.html#_install_on_rhel_or_centos_hosts):
wget http://archive.cloudera.com/kudu/redhat/6/x86_64/kudu/cloudera-kudu.repo
sudo mv cloudera-kudu.repo /etc/yum.repos.d/
sudo yum update
sudo yum install kudu kudu-client0 kudu-client-devel
install a bunch of dev dependencies if you don't have them already:
sudo yum install autoconf automake libtool make gcc gcc-c++
install Cython and kudu-python
pip install Cython kudu-python==1.2.0
Once you have this installed, you can find examples in https://github.com/apache/kudu/tree/master/examples/python

i had no ability to install kudu-client (windows os is not supported) so i used the cluster's Impala to get Kudu's tables:
from impala.dbapi import connect
conn = connect('<Impala Daemon>', port=21050)
cursor = conn.cursor()
cursor.execute('SELECT * FROM mytable LIMIT 100')
print(cursor.description) # prints the result set's schema
results = cursor.fetchall()
https://github.com/cloudera/impyla

Related

How to install libcurl with nss backend in aws ec2? (Python 3.6 64bit Amazon Linux)

I have an ec2 instance in AWS running Python3.6 (Amazon Linux/2.8.3) where I need to install pycurl with NSS ssl backend.
First I tried it by adding pycurl==7.43.0 --global-option="--with-nss" to my requirement.txt file but I was getting errors installation errors. So I ended up doing it by adding a .config file in .ebextensions (that runs during deployment):
container_commands:
09_pycurl_reinstall:
# the upgrade option is because it will run after PIP installs the requirements.txt file.
# and it needs to be done with the virtual-env activated
command: 'source /opt/python/run/venv/bin/activate && PYCURL_SSL_LIBRARY=nss pip3 install pycurl --global-option="--with-nss" --upgrade --no-cache-dir --compile --ignore-installed'
Pycurl seems to be correctly installed, but the celery worker is not running. The celery worker logs show:
__main__.ConfigurationError: Could not run curl-config: [Errno 2] No such file or directory
If I ssh connect to the instance and run python 3.6 -c 'import pycurl' I get a more detailed error:
ImportError: pycurl: libcurl link-time ssl backend (openssl) is different from compile-time ssl backend (nss)
So I guess that my problem is that I had previously installed libcurl with openSSL instead of NSS, and hence the mismatch between libcurl and pycurl.
According to another stackoverflow question, for libcurl to be installed with NSS backend I should have installed it with:
sudo apt libcurl4-nss-dev
But since the server is running Amazon Linux I can't use the apt command. So I did instead:
yum install libcurl-devel
And I guess this is the problem: this installs libcurl with OpenSSL support when I need it with NSS support.
How can I install libcurl with NSS in Amazon Linux?? (I need NSS because I'm running a Django app with celery using SQS as the broker, and SQS requires NSS).
Thank you very much!
I just ran into the same issue and did manage to fix it :)
Retrieve Amazon Linux configure options for libcurl:
curl-config --configure
All options referring to paths can be ignored, but other must be kept if you want the same features than system libcurl. Of course, --with-ssl will be replaced with --without-ssl --with-nss.
1.1 Install prerequisites:
sudo yum install libssh2-devel nss-devel
(of course you should rather add then to the packages > yum section of your ebextensions)
Compile libcurl from source (i chose 7.61.1 to match the one used by Amazon Linux 2018.03):
wget https://curl.haxx.se/download/curl-7.61.1.tar.gz
tar xf curl-7.61.1.tar.gz
cd curl-7.61.1
./configure '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--target=x86_64-amazon-linux-gnu' '--program-prefix=' '--cache-file=../config.cache' '--disable-static' '--enable-symbol-hiding' '--enable-ipv6' '--enable-threaded-resolver' '--with-gssapi' '--with-nghttp2' '--without-ssl' '--with-nss' '--with-ca-bundle=/etc/pki/tls/certs/ca-bundle.crt' '--enable-ldap' '--enable-ldaps' '--enable-manual' '--with-libidn2' '--with-libpsl' '--with-libssh2' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'target_alias=x86_64-amazon-linux-gnu' 'CFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic'
make
sudo make install
(of course this should be deployed properly as a Shell script through the files section of your ebextensions).
Now you should see libcurl.so.4 created in /usr/local/lib.
Define the following envvars for pycurl to be compiled using your custom libcurl:
LD_LIBRARY_PATH=/usr/local/lib
PYCURL_CURL_CONFIG=/usr/local/bin/curl-config
PYCURL_SSL_LIBRARY=nss
Run your pip install
You can check pycurl linked to the right libcurl:
ldd /opt/python/run/venv/local/lib64/python3.6/site-packages/pycurl.cpython-36m-x86_64-linux-gnu.so
should show you libcurl.so.4 => /usr/local/lib64/libcurl.so.4
And of course python 3.6 -c 'import pycurl' should work.
That's it! You should be able to run Celery with SQS.
I had loads of issues getting pycurl installed on my Python ec2 instance. First up, pip wouldn't install pycurl at all. I had to ssh and run
yum install libcurl-devel
After this, I added the following line to my requirements.txt file so that it included the relevant ssl option (--with-openssl) as an install option:
pycurl==7.43.0.5 --install-option="--with-openssl"
Once I deployed a new version with this change in requirements.txt, pycurl worked fine, which I tested with:
python 'import pycurl'
I knew I needed to use openssl from the error message
libcurl link-time ssl backend (openssl)

Error by create secure_channel in grpc of python

grpc was installed using pip. I tried to use it and I got an error. I looked for errors, but I could not find a solution.
The environment through uname is as follows.
env
uname -s -> Linux
uname -r -> 3.10.65
uname -m -> aarch64
code
import grpc
creds = grpc.ssl_channel_credentials(open('roots.pem').read())
channel = grpc.secure_channel('myservice.example.com:443', creds)
error log
24061 ssl_transport_security.c:655] Could not set ephemeral ECDH key.
24061 security_connector.c:774] Handshaker factory creation failed with TSI_INTERNAL_ERROR.
(I love that xkcd comic.. I feel that way all the time)
I solved this on arm64, here's how:
Under-the-hood grpcio uses boringssl if an environment variable is not applied to use the system's ssl. I believe the issue seen is with boringssl actually. Assuming you have openssl, libssl1.0.0 and libssl-dev installed, and you have already installed grpcio via pip or pip3.. then, you would do the following:
First download the grpcio source from PyPi.
pip3 uninstall grpcio
tar -xvf grpcio-<version>.tar.gz
cd grpcio-<version>/
GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=True python3 setup.py install
I'm sure setting GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=True before your pip3 install would work also, but I haven't tested that. The above solution fixed my issue.

PyODBC : can't open the driver even if it exists

I'm new to the linux world and I want to query a Microsoft SQL Server from Python. I used it on Windows and it was perfectly fine but in Linux it's quite painful.
After some hours, I finally succeed to install the Microsoft ODBC driver on Linux Mint with unixODBC.
Then, I set up an anaconda with python 3 environment.
I then do this :
import pyodbc as odbc
sql_PIM = odbc.connect("Driver={ODBC Driver 13 for SQL Server};Server=XXX;Database=YYY;Trusted_Connection=Yes")
It returns :
('01000', "[01000] [unixODBC][Driver Manager]Can't open lib '/opt/microsoft/msodbcsql/lib64/libmsodbcsql-13.0.so.0.0' : file not found (0) (SQLDriverConnect)")
The thing I do not undertsand is that PyODBC seems to read the right filepath from odbcinst.ini and still does not work.
I went to "/opt/microsoft/msodbcsql/lib64/libmsodbcsql-13.0.so.0.0" and the file actually exists !
So why does it tell me that it does not exist ?
Here are some possible clues :
I'm on a virtual environment
I need to have "read" rights because it's a root filepath
I do not know how to solve either of these problems.
Thanks !
I also had the same problem on Ubuntu 14 after following the microsoft tutorial for SQL Server Linux ODBC Driver.
The file exists and after running an ldd, it showed there were dependencies missing:
/opt/microsoft/msodbcsql/lib64/libmsodbcsql-13.0.so.0.0: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version GLIBCXX_3.4.20' not found (required by /opt/microsoft/msodbcsql/lib64/libmsodbcsql-13.0.so.0.0)
/opt/microsoft/msodbcsql/lib64/libmsodbcsql-13.0.so.0.0: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: versionCXXABI_1.3.8' not found (required by
after searching for a while I found its because Ubuntu's repo didnt have GLIBCXX on version 3.4.20, it was at 3.4.19.
I then added a repo to Ubuntu, updated it and forced it to upgrade libstdc++6
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install libstdc++6
Problem solved, tested with isql:
+---------------------------------------+
| Connected! |
| |
| sql-statement |
| help [tablename] |
| quit |
| |
+---------------------------------------+
SQL>
After that I tried testing using pdo_odbc (PHP), it then gave me the same driver not found error.
To solve this I had to create a symbolic link to fix libodbcinst.so.2:
sudo ln -s /usr/lib64/libodbcinst.so.2 /lib/x86_64-linux-gnu/libodbcinst.so.2
I had the same problem 'file not found (0) (SQLDriverConnect)' on MAC OS with the following code
cnxn = pyodbc.connect('DRIVER={ODBC Driver 13 for SQL Server};SERVER=myServerIP,1433;DATABASE=myDBName;UID=sa;PWD=dbPassword')
after googling for two days, I cannot fix the issue even modify the freetds.conf, odbcinst.ini and odbc.ini
finally, I found the solution via replacing DRIVER value
cnxn =
pyodbc.connect('DRIVER={/usr/local/lib/libmsodbcsql.13.dylib};SERVER=myServerIP,1433;DATABASE=myDBName;UID=sa;PWD=dbPassword')
My dev environment
MAC OS El Capitan
python 3.6.1 in Anaconda
I found an answer that works for me here. This is for python 2.7 (so may not work for those who are looking for a solution for python 3.x).
The suggested solution is to update libgcc: 4.8.5-2 --> 5.2.0-0
For updating libgcc, use this command
conda update libgcc
The following suggestions may help to solve the problem:
Make sure the drive configuration INI file exists odbcinst -j (check odbcinst.ini).
Make sure the filepath to configured driver from your INI file (run: odbcinst -j) exist and has read and executable permission flags (O_RDONLY|O_CLOEXEC).
If you still got file not found error, despite the file exists, the problem could be related to libgcc version mismatch as per nehaljwani's GitHub comment. The solution is to update your libgcc by running conda update libgcc command.
Related: ODBC Driver 13 for SQL Server can't open lib on pyodbc while connecting on AWS E2 ubuntu instance.
For macOS, see: Installing ODBC via HomeBrew.
Maybe it is a bit late, but I leave this scripts that worked for me.
My problem was the same as yours and I tested all the options such as changing the driver location, making a symbolic link, modify /etc/*.ini files, etc... nothing worked.
My problem, running python 3.6, pyodbc package in a docker container from alpine was the library libssl1.0.0
Here you will find my installation script for pyodbc Debian 8 (alpine) docker image using the driver v13
DRIVER={ODBC Driver 13 for SQL Server}
The command I run for database connection was:
import pyodbc
connection_string = 'DRIVER={ODBC Driver 13 for SQL Server};'
connection_string += 'SERVER={0};DATABASE={1};UID={2};PWD={3};'.format(host,dbname,user,pwd)
connection = pyodbc.connect(connection_string)
I solve this problem after installing libssl1.0.0.
First, I setup my connection string in this way:
cnxn = pyodbc.connect('DRIVER={/usr/local/lib/libmsodbcsql.13.dylib};
SERVER=myServerIP,1433;DATABASE=myDBName;UID=sa;PWD=dbPassword')
Then, I installed libssl1.0.0:
echo "deb http://security.debian.org/debian-security jessie/updates main" >> /etc/apt/sources.list
apt-get install libssl1.0.0
Finnaly, I setup the locales:
apt-get -y install locales
echo "en_US.UTF-8 UTF-8" > /etc/locale.gen
After doing these steps, my python module was able to find and connect to database.
had the same issue once..
1.try conda update libgcc(this is because pyodbc installed through pip and conda look for different versions of the file..)..this might have been fixed .....
link:https://github.com/ContinuumIO/anaconda-issues/issues/1639
look for nehaljwani answer .
2.also check the version number of the odbc file correctly in /etc/odbcinst.ini and /etc/odbc.ini ...names should match and also the driver path.
Had the same problem on a Mac mini M1 with macOS Mavericks. After installing the driver 18 from Microsoft which supports ARM it still did not work.
brew tap microsoft/mssql-release https://github.com/Microsoft/homebrew-mssql-release
brew update
HOMEBREW_NO_ENV_FILTERING=1 ACCEPT_EULA=Y brew install msodbcsql18 mssql-tools18
However isql on the commandline was able to connect to the database.
isql -v -k "DRIVER={ODBC Driver 18 for SQL Server};SERVER=<MYSERVERNAME>;PORT=<MYPORT>;DATABASE=<MYDATABASE>;UID=<USERNAME>;PWD=<PASSWORD>"
Finally what did the trick was to uninstall pyodbc and install it again.
python -m pip uninstall pyodbc
python -m pip install pyodbc

Psycopg2 Python SSL Support is not compiled in

I am trying to connect to my postgres database using psycopg2 with sslmode='required' param; however, I get the following error
psycopg2.OperationalError: sslmode value "require" invalid when SSL support is not compiled in
Heres a couple details about my system
Mac OS X El Capitan
Python 2.7
Installed psycopg2 via pip
Installed python via homebrew
Here is what I tried to do to fix the problem
brew uninstall python
which python still shows python living in /usr/local/bin/python, tried to uninstall this but couldnt. And heard that this is the python that the OS uses and should not be uninstalled anyways
brew install python --with-brewed-openssl --build-from-source
pip uninstall psycopg2
pip install psycopg2
After doing all of this, the exception still happens. I am running this python script via #!/usr/bin/env python Not sure if it matters, but that is a different directory than the one that which python shows
Since you're installing via pip, you should be using the most recent version of psycopg2 (2.6.1).
After a little digging through the code, it seems that the exception is being thrown in connection_int.c, which directly calls the postgresql-c-libraries to set up the db-connection. The call happens like so:
self->pgconn = pgconn = PQconnectStart(self->dsn);
Dprintf("conn_connect: new postgresql connection at %p", pgconn);
if (pgconn == NULL)
{
Dprintf("conn_connect: PQconnectStart(%s) FAILED", self->dsn);
PyErr_SetString(OperationalError, "PQconnectStart() failed");
return -1;
}
else if (PQstatus(pgconn) == CONNECTION_BAD)
{
Dprintf("conn_connect: PQconnectdb(%s) returned BAD", self->dsn);
PyErr_SetString(OperationalError, PQerrorMessage(pgconn));
return -1;
}
The keywords which were specified in your connect statement to psycopg2.connect() are being handled to that function and errors are returned as OperationalError exception.
The error is actually being generated directly in the postgresql-lib - you may want to check which version you are using, how it was built and, if possible, upgrade it to a version with SSL support or rebuilt it from source with SSL enabled.
The postgresql-docs also state that missing SSL support will raise an error, if the sslmode is set to require, verify-ca or verify-full. See here under sslmode for reference.
The postgres-website lists several ways to install postgres from binary packages, so you might choose one which suits your needs. I'm not familiar with OSX, so I don't have a recommendation what's best.
This question may also be helpful.
You also need to reinstall the psycopg2-module, be sure to use the newly installed lib when rebuilding it. Refer to the linked question (in short, you will need to place the path to pg_config which is included in your new installation to $PATH when running pip install psycopg2).
I had this same error, which turned out to be because I was using the Anaconda version of psycopg2. To fix it, I had adapt VictorF's solution from here and run:
conda uninstall psycopg2
sudo ln -s /Users/YOURUSERNAME/anaconda/lib/libssl.1.0.0.dylib /usr/local/lib
sudo ln -s /Users/YOURUSERNAME/anaconda/lib/libcrypto.1.0.0.dylib /usr/local/lib
pip install psycopg2
Then when you run conda list you'll see psycopg2 installed with <pip> in the far right column. After that, I just restarted Python and everything worked.
The error you are receiving is caused by a problem with Postgres itself, and not psycopg2.
psycopg2.OperationalError: sslmode value "require" invalid when SSL support is not compiled in
The above indicates that the version of Postgres that psycopg2 is built against does not have SSL support compiled in. When you attempt to connect to the running Posgres server with ssl=require it throws this error.
You don't mention how you installed Postgres but since you are using Homebrew for other things, I recommend you also install Postgres the same way:
$ brew update
$ brew install postgresql
The formula for postgresql shows that it depends on openssl and compiles with the --with-openssl flag set. It will also install the neccessary libpq headers. You may need to reinstall psycopg2 after this step to ensure it picks up the correct libraries/version.
Interestingly, there is a bug listed against conda which lists the same error that you report occurring when the conda version of psycopg2 — linked on a system with Homebrew postgres — was installed on a system without, suggesting missing SSL libraries can also trigger this.
I would suggest uninstalling any existing Postgres versions (including any installed via Homebrew) before reinstalling to minimise the risk of the wrong one being used.
As others have said, the error message looks to be coming from Postgres. You can verify this by typing: psql sslmode=require if it gives you a pgsql terminal then ssl works with postgres, if it errors then it doesn't
Try and remove postgres and reinstall with openssl support:
brew uninstall postgres
brew update
brew install postgres --with-openssl
Alternatively, and this is the way I'd suggest, there is a one click installer at http://postgresapp.com that might also make it easier to get it installed. Follow the instructions on the site to get it installed correctly.
When I did it on Yosemite it installed at ~/Library/Application\ Support/Postgres93/var
You'll also want to create a certificate (this could also be where the error is coming from) either from a registrar if this is going to be public facing in the slightest or self signed if it's for a dev/test environment.
openssl req -new -text -out server.req
openssl rsa -in privkey.pem -out server.key
rm privkey.pem
openssl req -x509 -in server.req -text -key server.key -out server.crt
chmod og-rwx server.key
Navigate to your config directory, by default it is: ~/Library/Application\ Support/Postgres93/var
Enable ssl support:
vim postgresql.conf
# change this:
# ssl = on
# to this:
ssl = on
Restart the app and then check ssl with psql "sslmode=require"
If that works then try it through your Python code. If it works with the code above, but not Python then it's definitely a Python code problem that needs to be worked through.
As I can not comment:
Adding to Brideau's answer that this only worked for me after changing my version of Postgres.
brew uninstall postgres
brew update
brew install postgres --with-openssl
Then run the code provided by Brideau and it should work.
If you're using v2.6.1 or v2.6.2 of psycopg2 (like me), the answer was a simple upgrade to v2.7. Reading the release notes for psycopg2, there was a minor fix for SSL albeit it doesn't look particularly relevant.
My setup was as follows:
Mac OS X El Capitan 10.11.6
psycopg2 2.6.2 installed via pip
PostgreSQL installed via Enterprise DB Installer
Running pip uninstall psycopg2 followed by pip install psycopg2 resolved matters.
Try to install psycopg2 from MacPorts
sudo port install py27-psycopg2

cx_Oracle.so: undefined symbol:PyUnicodeUCS2_AsEncodedString

I am having issiues installing cx_oracle. I have installed oracle instantclient and cx_oracle oracle packages ones installed i am getting this error while importing cx_oracle. i am running ubuntu 11.10 as host.
import cx_Oracle
Traceback (most recent call last):
File "<console>", line 1, in <module>
ImportError: /usr/lib/python2.7/dist-packages/cx_Oracle.so: undefined symbol:PyUnicodeUCS2_AsEncodedString
any one have any idea how to resolve this issue
cheers
Most probably your Python install uses another unicode format (ucs4) and cx_oracle was compiled with ucs2.
You can install cx_Oracle 5.0.4 with the unicode flag. That worked for me but there is some bug: strange Oracle error: "invalid format text"
Or compile the latest cx_oracle yourself.
http://mrpolo.com.ve/?p=178 (its some language i don't know but it helped)
I addition to #froZieglers answer. When I came along the cx_Oracle page there was no "...Unicode..."-Variant to download anymore. Luckily compiling it myself from source was not a big a hassle then I expected.
Here a summary about what I did (Ubuntu 12.04 LTS, 64bit):
install the proper Oracle XE client rpm with alien (11g, 64bit, etc...)
it installs th /u01/..., I had to adjust .profile too, of course.
download cx_Oracle source-tar, untar, cd into
I did the ln -s command on the so-lib on Oracle, as said in BUILD text file
Install Python headers with sudo aptitude install python-dev
Compile with python setup.py build
Install with sudo python setup.by install
First try failed with distutils.errors.DistutilsSetupError: cannot locate an Oracle software installation
patched setup.py with setting userOracleHome = "/u01/app/oracle/product/11.2.0/xe" after os.getenv("ORACLE_HOME")
sudo python setup.by install then worked
Check with python -c 'import cx_Oracle' succeeded.

Categories