Connecting standalone hbase from python

Connecting standalone hbase from python - python

I am developing a python application with hbase as backend. I have installed hbase and its shell is working perfectly. Please note, I have not yet installed hadoop as I don't have nodes.But decided to use hbase because of impressive architecture.
Now the problem is, I am unable to connect to hbase from python either from libraries like happybase or directly using thrift. I tried this - http://binesh.in/hbase/connecting-to-a-remote-standalone-hbase/ as well, but no use. Please help me on this.
Update -
> >>> import happybase
> >>> con =happybase.Connection('localhost') Traceback (most recent call last): File "<stdin>", line 1, in <module> File
> "/usr/local/lib/python2.7/dist-packages/happybase/api.py", line 121,
> in __init__
> self.open() File "/usr/local/lib/python2.7/dist-packages/happybase/api.py", line 138,
> in open
> self.transport.open() File "build/bdist.linux-i686/egg/thrift/transport/TTransport.py", line 149,
> in open File
> "build/bdist.linux-i686/egg/thrift/transport/TSocket.py", line 99, in
> open thrift.transport.TTransport.TTransportException: Could not
> connect to localhost:9090
> >>>
almost same prbm with directly calling. In all, I just want to use hbase database instead of mongodb in my python application. No hadoop, no hdfs etc.Is it feasible or I am trying to achieve something impossible?

The Trift server hast to be up and running. Your connection has to be opened as well:
nohup hbase thrift start &
Open connection in python before use. Example:
import happybase
connection = happybase.Connection('localhost', autoconnect=False)
connection.open()

Related

Python mariadb module does not connect to database on network

I am trying to connect to a mariadb-database on my local network. using Python.
import mariadb
cursor = mariadb.connect(host='192.168.178.77', user='someuser', password='somepass', db='temps')
Output is:
Traceback (most recent call last):
File "/Users/localuser/PycharmProjects/SQL/main.py", line 20, in <module>
cursor = mariadb.connect(host='192.168.178.77', user='someuser', password='somepass', db='temps')
File "/Users/user/.conda/envs/SQL/lib/python3.10/site-packages/mariadb/__init__.py", line 142, in connect
connection = connectionclass(*args, **kwargs)
File "/Users/localuser/.conda/envs/SQL/lib/python3.10/site-packages/mariadb/connections.py", line 86, in __init__
super().__init__(*args, **kwargs)
mariadb.OperationalError: Can't connect to server on '192.168.178.77' (60)
I can connect via Pycharms Database functionality and send SQL Statements.
I also can use DB management tools from that very host and use data without any issue.
It even works from my phone.
This code is the only place where I get an error.
OS is MacOS13.0.1
Thank You!

This happens due to a bug in MariaDB Connector/C. (Issue CONC-612).
The issue was fixed in C/C Version 3.3.3 - which is available via brew:
After
brew update
brew upgrade mariadb-connector-c
connection should work as expected.

I've got the same problem recently. Add port variable and check other. If doesn't help, try mysql-connector-python it works similar. Or install mariadb connector manually

PySpark: "/usr/libexec/java_home/bin/java: Not a directory" (macOS Big Sur)

I'm trying to run hello-world PySpark application.
I'm using PyCharm
Code of my LOL.py script:
import os
os.environ["SPARK_HOME"] = "/opt/spark"
from pyspark.sql import SparkSession
def init_spark():
spark = SparkSession.builder.appName("HelloWorld").getOrCreate()
sc = spark.sparkContext
return spark,sc
def main():
spark,sc = init_spark()
nums = sc.parallelize([1,2,3,4])
print(nums.map(lambda x: x*x).collect())
if __name__ == '__main__':
main()
Output:
/opt/spark/bin/spark-class: line 71: /usr/libexec/java_home/bin/java: Not a directory
Traceback (most recent call last):
File "/Users/evgenii/PycharmProjects/bi-etl-orders/LOL.py", line 19, in <module>
main()
File "/Users/evgenii/PycharmProjects/bi-etl-orders/LOL.py", line 13, in main
spark,sc = init_spark()
File "/Users/evgenii/PycharmProjects/bi-etl-orders/LOL.py", line 8, in init_spark
spark = SparkSession.builder.appName("HelloWorld").getOrCreate()
File "/Users/evgenii/.pyenv/versions/3.7.5/lib/python3.7/site-packages/pyspark/sql/session.py", line 173, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "/Users/evgenii/.pyenv/versions/3.7.5/lib/python3.7/site-packages/pyspark/context.py", line 349, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "/Users/evgenii/.pyenv/versions/3.7.5/lib/python3.7/site-packages/pyspark/context.py", line 115, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "/Users/evgenii/.pyenv/versions/3.7.5/lib/python3.7/site-packages/pyspark/context.py", line 298, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/Users/evgenii/.pyenv/versions/3.7.5/lib/python3.7/site-packages/pyspark/java_gateway.py", line 94, in launch_gateway
raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number
I know that "Java gateway process exited before sending its port number" is often raised because of incorrect setup of JAVA_HOME.
But I think this is not the case, because my JAVA_HOME looks pretty normal:
$ echo $JAVA_HOME
/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home
This is my SPARK_HOME:
$ echo $SPARK_HOME
/opt/spark
My environment:
Python 3.7.5 (I use pyenv if it matters)
Java adoptopenjdk-8.jdk (installed via Homebrew. I also have adoptopenjdk-11.jdk stored in the same folder if it matters)
PySpark 2.4.0
MacOS BigSur 11.5.2
PyCharm Pro 2021.1.3
I read several related guides, but none of them helped for now.
I'll be very appreciated for any help.
Thank you in advance!

Spark 2.4.0 will require Java 8, which you have on your system, but it looks like PyCharm is not using this version, even if available. I had a similar problem with Eclipse, which defaulted to the last version of Java, which created compatibility issues, which I suspect in your application.
Are you running your app from PyCharm? If so can you display the environment values from there?
How have you installed Spark? You mention brew for Java, did you also use brew for Spark? If not, maybe it’s worth a try as dependencies are really tricky.
What happens when you run java -version in a shell? Can you try to uninstall Java 11 and maybe reinstall Java 8?
I hope this few advice can help.

Writing a network Scanner with python using scapy

I am writing a simple network scanner with python using scapy following is my code :
import scapy.all as scapy
def scan(ip):
scapy.arping(ip)
scan("192.168.1.1/24")
Error I am getting :
Traceback (most recent call last):
File "ipScanner.py", line 10, in <module>
scan("192.168.1.1/24")
File "ipScanner.py", line 8, in scan
scapy.arping(ip)
File "/Users/omairkhan/opt/anaconda3/lib/python3.7/site-packages/scapy/layers/l2.py", line 648, in arping
filter="arp and arp[7] = 2", timeout=timeout, iface_hint=net, **kargs) # noqa: E501
File "/Users/omairkhan/opt/anaconda3/lib/python3.7/site-packages/scapy/sendrecv.py", line 553, in srp
filter=filter, nofilter=nofilter, type=type)
File "/Users/omairkhan/opt/anaconda3/lib/python3.7/site-packages/scapy/arch/bpf/supersocket.py", line 242, in __init__
super(L2bpfListenSocket, self).__init__(*args, **kwargs)
File "/Users/omairkhan/opt/anaconda3/lib/python3.7/site-packages/scapy/arch/bpf/supersocket.py", line 62, in __init__
(self.ins, self.dev_bpf) = get_dev_bpf()
File "/Users/omairkhan/opt/anaconda3/lib/python3.7/site-packages/scapy/arch/bpf/core.py", line 114, in get_dev_bpf
raise Scapy_Exception("No /dev/bpf handle is available !")
scapy.error.Scapy_Exception: No /dev/bpf handle is available !
Exception ignored in: <function _L2bpfSocket.__del__ at 0x105984c20>
Traceback (most recent call last):
File "/Users/omairkhan/opt/anaconda3/lib/python3.7/site-packages/scapy/arch/bpf/supersocket.py", line 139, in __del__
self.close()
File "/Users/omairkhan/opt/anaconda3/lib/python3.7/site-packages/scapy/arch/bpf/supersocket.py", line 211, in close
if not self.closed and self.ins is not None:
AttributeError: 'L2bpfSocket' object has no attribute 'ins'
Can anyone please help understand it.
NOTE: I am running it on mac OS.

I wrote this exact program when I first started programming with matching syntax, and it ran correctly on my systems when run as administrator. I develop on Linux and Windows rather than Mac, but I will offer what I can.
Are you running this script through your IDE or calling it from the shell?
I recommend only running it from the shell. This simply gives you more control over the files like specifying which version of python the script is, and if you need administrative privileges for a script, you can elevate the script permissions in the shell.
Also, in my OS, I was taught to always use, and have experienced the mistakes of forgetting this, always add:
#!/usr/bin/env python
as the first line of every script. At least in Linux, it tells the PC how to treat the file (it tells it to treat the file as a python file--yes I acknowledge that its already running it as python). I would check to see if that is valid for MacOS file system.
Most of what I have recommended so far comes down to no /dev/bpf handle is available, only ever being an issue for me when I'm not running script as an administrator (although Linux states permission denied). And I shouldn't leave out that using Anaconda on Windows in the past (before I understood the structure of my file systems) prevented me from using common modules like pygame and scapy. I could only guess in that case Anaconda prevented the PC from knowing where to find every piece of that module by making the computer think it had its own one of that module under Anaconda directory when it was in a different PATH.

Python connect to Hadoop using Hive with Kerberos authentication

I am trying to connect python to Hadoop using Hive using Kerberos. Tried various sources but failed in connecting
import pyhs2
conn_config = {'krb_host': 'hostname', 'krb_service': 'hive'}
pyhs2.connect(host='hostname',
port=10000,
authMechanism="KERBEROS",
password="********",
user='hostname#SCGLOBALUAT.ADUAT.SCOTIACAPITAL.COM')
Error Encountered:
authMechanism="KERBEROS") as conn:
File "build\bdist.win-amd64\egg\pyhs2\__init__.py", line 7, in connect
File "build\bdist.win-amd64\egg\pyhs2\connections.py", line 46, in __init__
File "build\bdist.win-amd64\egg\pyhs2\cloudera\thrift_sasl.py", line 66, in open
thrift.transport.TTransport.TTransportException: Could not start SASL: Error in sasl_client_start (-4) SASL(-4): no mechanism available: Unable to find a callback: 2
can please somebody help me to give clear instructions to connect python to Hadoop using Hive with Kerberos Ticket

Opening Google App Engine in a Python script

I'm fairly new to programming and decided to setup a simple python script that would open all the applications I use for webapp development. The code I am using is (for GAE):
google_appengine = r'C:\Applications\google_app_engine\launcher\GoogleAppEngineLauncher.exe'
subprocess.Popen(google_appengine)
This works fine for the other programs I am opening, but I am unable to run any applications within App Engine after I have opened it this way. I get the following error in my App Engine log file:
Exception in thread Thread-2:
Traceback (most recent call last):
File "threading.pyc", line 486, in __bootstrap_inner
File "launcher\taskthread.pyc", line 65, in run
File "subprocess.pyc", line 587, in __init__
File "subprocess.pyc", line 700, in _get_handles
File "subprocess.pyc", line 745, in _make_inheritable
WindowsError: [Error 6] The handle is invalid
I'm guessing it is the way subprocess.Popen() works, but I haven't been able to find any alternatives. I'm running Windows 7 if that makes a difference. Thanks for looking.

if you want to manage the local dev_appserver, this is the wrong approach.
the best way to do this is clone the sdk repository (https://code.google.com/p/googleappengine/) directly to your drive and then add that path to your environment PYTHONPATH variable.
here's a link to a script template i created & often use to manage startup & killing of the dev_appserver process: https://gist.github.com/4514647
i'm not too familiar with managing a python environment on Windows, so you'd have to take my notes on a highlevel and research the specific implementation for that platform.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.