Python 2.7 Modules with Hive Streaming - python

I am doing Hive Streaming on a DSE 3.0 cluster (Hive 0.9) using a Python mapper. My python script imports the statsmodels module, which requires Python 2.7. Since the default is not 2.7 (it's 2.4), I download and install it, as well as the statsmodels module.
However, when running the simple Hive query
hive> select transform (line) using 'python python-mapper.py' from docs;
where "docs" is a Hive table with line STRING's. However, I get the error:
File "python-mapper.py", line 6, in ?
import statsmodels
ImportError: No module named statsmodels
So I changed my Hive query to:
hive> select transform (line) using 'python2.7 python-mapper.py' from docs;
to invoke version 2.7. But then I get the error
Caused by: java.io.IOException: Cannot run program "python2.7":
java.io.IOException: error=2, No such file or directory
I have also tried python27 and /usr/local/bin/python2.7 and am still getting the same error. Has anyone encountered this before? I have already referenced the second answer to the post On linux SUSE or RedHat, how do I load Python 2.7. Any advice would be greatly appreciated!
Thanks,
AM

I know this is abit old however I came across the same problem recently and thought I would answer for anybody else who came across this problem.
python2.7 command won't work if you have more than one version of python installed.
There are two ways of solving this. One, use a python virtual environment, which would allow you to start your script and add this as a resource to distribute across all your nodes. Two, you can find out where you python2.7 libs are installed by typing:
which python2.7
and then reference the location in your hive query like so (example):
select transform (line) using '/usr/local/bin/python2.7 python-mapper.py' from docs;
Caution each node may have a different location where python2.7 is installed so check before hand. Better yet use a virtual environment.

Related

Importing python modules works in command line but not in the python 3.8 shell

I'm having a problem with importing modules in python.
When I run my program in the command line it works perfectly fine.
However, when I try to run the same program in the python shell I am prompted with the following error:
ModuleNotFoundError: No module named 'matplotlib'
I already successfully installed matplotlib using 'python -m pip install matplotlib'.
I've read this can happen when you have two different versions of python installed; however, I don't.
I've uninstalled and reinstalled python and I still am having the same issue. I've also uninstalled and reinstalled matplotlib using pip.
I believe my problem is the module paths that python uses to search for imported modules are different between the two.
When I use the 'print(sys.path)' command in the python shell and the command line I get two different outputs.
Any help would be greatly appreciated!!!
The file different system paths between the python shell and the command line
You have two versions of python. I would recommend you to remove all pythons you have and go for anaconda https://www.anaconda.com/distribution/. It will fix your path problems and allow you to create environments with different versions of python. This is the least painful way also for future :) good luck.
I suppose, you have both of the Python versions installed on the same computer.
If that is so, then my answer would be to go inside both Python script folders and install matplotlib on both of them.
I have also faced that issue. My path includes pip of Python 3.7.1 and whenevwer I try to import modules on Python 3.4. It throws an error!
Maybe, you could add both of the Pythons to the path.
I encountered this same problem – python -c "import sklearn" would work just fine, but import sklearn inside a Python program failed. Both my one-liner and program was using the same Python version (version 3.8.10).
I eventually got the program to work by replacing the shebang line (originally #!/usr/bin/python) with #!/bin/env python.
I don't know why this worked exactly (sorry). Presumably some path got reset, and the module loaded from a different location, but it might help someone so I'm posting it here nontheless. (If you know more, feel free to edit this answer.)

Using Flask and Jython under Windows 7

Hello everyone,
So I have been trying to use Jython to connect to an API Rest and retrieve some information. Now I want to use the Flask Framework with it. I have been trying to install the Flask with Jython but it does not seem to work at all. I am working on a Windows 7 machine and the problem for me is also that I can not download directly from the internet. For all other framework I used python wheels and installed these with Jython which worked fine.
I already tried to following commands and got these errors:
First error that I got was that it could not find the 'init.py' file in the flask folder so I changed the path in the file to the total path. But it just continued to give me more errors.
jython -m pip install '*.whl
Screenshot of the command line ouput of the error
pip install '*.whl (same as above)
I am a little stuck here and I hope that someone has an idea on how to solve this problem.
Big thanks already!!
This appears to be a bug with Jython 2.7.0. See this error report in pip and this one in Jython.
The second of those indicates that it is fixed in the 2.7.1 release candidate.

Illegal instruction: 4 when importing python pandas

Since my Macbook with an i7 CPU is currently with AppleCare, I am now working on an older Mac mini with a core duo CPU. I simply connected the Macbook's internal disk via USB to the Mac mini.
Now back at my Python scripts, I ran into a problem which I don't fully understand and do not know how to debug. When I import pandas in Python 2.7.9, Python crashes completely and I get the error Illegal instruction: 4. After some googling I assume, that some packages are compiled for the wrong architecture. But I don't know which ones.
I installed Python, numpy and scipy with homebrew and pandas, etc. with pip into a virtual environment. My system is OS X 10.10.5.
The output of python -vc "import pandas" is very long and given here.
I tried re-installing Python, pandas, numpy, and scipy.
How can I find out which package is causing the error?
Do I need to set an architecture flag or something?
How can I fix this?
Removing the .pyc files might work too.
Since it happens right after the call to
dlopen("/usr/local/lib/python2.7/site-packages/matplotlib/_pabc.so", 2);,
you can try checking the arch type that file was built for with:
file /usr/local/lib/python2.7/site-packages/matplotlib/_pabc.so
then check the arch type of your hardward:
uname -a
If the shared object file (_pabc.so) was not built for that machine you may need to re compile/install/whatever, matplotlib or one of its dependancies.
In my recent experience, this was indeed caused by a linked library being of the wrong architecture as the module's library (as chown suggested).
In particular, a C-compiled python library as part of the python module you're importing (the _mymodule.so file in the module directory) calling a linked system library (eg. libgfortran.dylib), and there being an architecture mismatch between the two.
As aforementioned, you can check the architecture of your system with uname -a and check the arch of an offending dylib via the file /path/to/lib.dylib command.

Verify thread-safety MySQLdb (Python) prior to Trac installation

I'm trying to install Trac manually for the first time. I don't want to use a one-click-installer like Bitmani, I want to learn how to install Trac manually, so I'm following the instructions carefully. I'm installing it in a Windows localhost for now, before installing it in a Linux environment.
As I follow the instructions carefully, I needed to install Python+MySQLDb, and I read this:
thread-safety is important
(...) verify that it is thread-safe by calling MySQLdb.thread_safe() from a standalone Python script (i.e., not under Apache). If the stand-alone test reports that MySQLdb is indeed thread-safe (...)
I've just installed MySQLDb 1.2.4 and I'd like to verify this. I've Googled but I haven't found an example about this, and I have no idea about Python. How can I verify if I've got a thread-safe installation?
Run this command. If you get 1 in the output, your installation is threadsafe.
python -c "import MySQLdb ; print MySQLdb.thread_safe()"

Mac OSX-Lion Python-LDAP Unrecognized Symbol: _ber_pvt_opt_on

I've been trying to run a web server based in Python 2.6 code here at work. The server requires the python LDAP libraries in order to run. Because I'm working on Mac OS X Lion, I needed to run a manual install of python-ldap 2.4.7 in order to get Python to recognize LDAP at all. Python-ldap appeared to intall correctly, but when I try to run the web server I get the following error (I added some line breaks for the sake of clarity):
ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/python_ldap-2.4.7-py2.6-macosx-10.7-fat.egg/_ldap.so, 2): Symbol not found: _ber_pvt_opt_on
Referenced from: /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/python_ldap-2.4.7-py2.6-macosx-10.7-fat.egg/_ldap.so
Expected in: flat namespace
in /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/python_ldap-2.4.7-py2.6-macosx-10.7-fat.egg/_ldap.so
I'm using OpenLDAP 2.4.21, and I set the following system variables before running my python-ldap installation:
export ARCHFLAGS="-arch i386"
export CFLAGS="-isysroot /Developer/SDKs/MacOSX10.7.sdk -arch i386"
export MACOSX_DEPLOYMENT_TARGET="10.7"
I should also probably mention that I've had to force everything I've installed to use 32-bit architecture instead of 64-bit in order to work around some issues with Oracle's 64-bit support. Has anyone encountered a similar situation, or do they know the significance of the "_ber_pvt_opt_on" symbol that LDAP was looking for? The number of Google results I was able to come up with was both small and unhelpful. Any light that you could shed on the situation would be greatly appreciated.

Categories