Apache Spark: How to use Python 3 with pySpark for development - python

So far I have done following:
import os
os.environ["SPARK_HOME"] = '/usr/local/spark/'
os.environ["PYSPARK_PYTHON"] = '/opt/conda/bin/python'
from pyspark import SparkContext
But when I run I get the error:
ssh://vagrant#127.0.0.1:2222/opt/conda/bin/python -u /home/vagrant/src/spark.py
Traceback (most recent call last):
File "/home/vagrant/src/spark.py", line 6, in <module>
from pyspark import SparkContext
ModuleNotFoundError: No module named 'pyspark'
Even If I try to run it without using Python3 path I get the same error.
The SPARK version of Python is given here:
/usr/local/spark/python
What wrong am I doing?
Ideally I want to use Python3 for my scripts.

try to:
import sys
sys.path.append('/usr/local/spark/python/pyspark')
or a direct way:
sudo ln -s /usr/local/spark/python/pyspark /usr/local/lib/python2.7/site-packages

Related

No module error found while importing pandas

I am trying to import panda in python script .
import pandas as pd
import numpy as np
But I am getting below error :
Error from Scripts is : Script failed to run:
Error: [Traceback (most recent call last):
File "<string>", line 2, in <module>
ModuleNotFoundError: No module named 'pandas'
] (2604) (2603)
This python script I am using in Cortex XSOAR (demisto).
I have to sort values in columns in one table. Google results shows that have to use pandas.DataFrame.sort_values. Hence, using this.
Please help me on fixing error with pandas module import or suggest me if there is any other way I can sort table values based on 1 column i.e integer values
Thanks in advance ,
NVP
Did you install pandas ?
py -m pip install pandas
I would suggest you to check where you installed pandas and use sys to append that path to your code.
For example:
import sys
sys.path.append(r'c:\users\NVP\appdata\local\programs\python\python39\lib\site-packages')

ModuleNotFoundError: No module named 'h2oaicore'

I am following the tutorial of driverless: Driverless AI Standalone Python Scoring Pipeline, you can check it in the following link:
http://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/scoring-standalone-python.html#tar-method-py
I am performing:
Running Python Scoring Process - Recommended
but, when running the last step:
DRIVERLESS_AI_LICENSE_KEY = "pastekey here" SCORING_PIPELINE_INSTALL_DEPENDENCIES = 0 /path/to/your/dai-env.sh ./run_example.sh
the following error happens:
Traceback (most recent call last): File "example.py", line 7, in
from scoring_h2oai_experiment_5fd7ff9c_b11a_11eb_b91f_0242ac110002 import Scorer File
"/usr/local/lib/python3.6/dist-packages/scoring_h2oai_experiment_5fd7ff9c_b11a_11eb_b91f_0242ac110002/init.py",
line 1, in
from .scorer import Scorer File "/usr/local/lib/python3.6/dist-packages/scoring_h2oai_experiment_5fd7ff9c_b11a_11eb_b91f_0242ac110002/scorer.py",
line 7, in
from h2oaicore import application_context
ModuleNotFoundError: No module named 'h2oaicore'
--
Hope you can help me, thanks in advance.
Apparently it doesn't find your h20aicore module. One brute force method to import it could be do the following in your python script:
import sys
import os
sys.path.append(os.path.abspath("path/to/module"))
or alternatively you should add it to your python path.
I agree with #BlackPhoenix's comment. It is looking for h2oaicore module. The DAI Python scoring pipeline comes with a h2oaicore .whl file. Check out the shell script, run_example.sh. It should contain steps about pip installing h2oaicore.

ImportError with different python installations.

I am getting below ImportError while running a python file.
ImportError: No module named osgeo.gdal_array
!/usr/bin/env python
from osgeo.gdal_array import BandReadAsArray
However, if i try to import same from command line, it runs fine.
$ which python
/home/hduser/anaconda2/bin/python
$ python
>>> from osgeo.gdal_array import BandReadAsArray
>>>
Also, please see the below where i am getting the same ImportError.
$ /usr/local/bin/python2.7
>>> from osgeo.gdal_array import BandReadAsArray
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named osgeo.gdal_array
I figured out that there is something going on between different versions of python. But, i do not want to change the original source code.
How do i make my program run without changing anything within the code of calling python installed in anaconda explicitly?

qsub python import

I'm running a job on the cluster for the first time. I run it with the following command:
qsub -cwd -S /usr/bin/python myScript.py
I have a python script that starts with:
import time
import anotherScript
The error I get:
Traceback (most recent call last):
File "/opt/sge62/default/spool/hpc01/job_scripts/487174", line 11, in <module>
import anotherScript
ImportError: No module named anotherScript
the anotherScript.py is in the same directory as myScript.py.
What can I do to solve the problem? would appreciate any help
well, problem was solved by sys.path.append(currentWorkingDirectory). However, it's definitely not a nice way.

python: my package import fails on 3.1, but works on 2.6

I decided to develop my home project in python 3.x rather than 2.x. So I decided to check, if it works under 3.1. I run python3.1 above my package directory and then:
>>> import fathom
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "fathom/__init__.py", line 3, in <module>
from schema import Database
ImportError: No module named schema
when I enter fathom directory however schema can be imported:
>>> import schema
Also when I run python2.6 above my package directory I can do this:
>>> import fathom
My __init__.py has following import:
from schema import Database
from inspectors import PostgresInspector, SqliteInspector, MySqlInspector
Should I add something for python3.1?
Did you try a relative import?
from . import schema
from .inspectors import PostgresInspector
Works in Python 2.6 as well.
The 2to3 script can help you pinpoint more of these problems.

Categories