Unable to import Pandas on Replit.com - Python - python

I'm unable to import pandas with import pandas as pd on replit.
I've already installed the package with pip install pandas and it can be seen in packages. I've successfully imported it to other projects on replit. Every time I try importing it into my code on this project, it gives me the following error:
Traceback (most recent call last):
File "main.py", line 1, in <module>
import pandas as pd
File "/home/runner/thing/venv/lib/python3.8/site-packages/pandas/__init__.py", line 16, in <module>
raise ImportError(
ImportError: Unable to import required dependencies:
numpy:
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.
We have compiled some common reasons and troubleshooting tips at:
https://numpy.org/devdocs/user/troubleshooting-importerror.html
Please note and check the following:
* The Python version is: Python3.8 from "/home/runner/thing/venv/bin/python"
* The NumPy version is: "1.22.2"
and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.
Original error was: libz.so.1: cannot open shared object file: No such file or directory

You don't need to use pip to install packages on repl.it -- and in fact, you shouldn't! Using Nix derivations not only works better (as you're using their OS distro the way it's designed), but also keeps their storage costs low, by allowing packages to be used from a read-only, hash-addressed, shared store.
Binaries built for other distributions might assume that there will be libraries in /lib, /usr/lib, or the like, but that's not how NixOS works: Libraries will be in a path like /nix/store/<hash>-<packagename>-<version>/lib, and those paths get embedded into the executables that use those libraries.
The easiest thing to do here is to create a new bash repl, but to add a Python interpreter to it. (I suggest this instead of using a Python repl because the way they have their Python REPLs set up adds a bunch of extra tools that need to be reconfigured; a bash repl keeps it simple).
Create a new bash repl.
Click on the three-dots menu.
Select "Show Hidden Files".
Open the file named replit.nix
Edit the file by adding a Python interpreter with pandas, as follows:
{ pkgs }: {
deps = [
pkgs.bashInteractive
(pkgs.python38.withPackages (p: [p.pandas]))
];
}
...changing that to taste (as long as they're getting software from a channel that has binaries for Python 3.9 or 3.10, for example, you can change python38 to python39 or python310).
Click the "run" button
In the new shell that opens, run python, and see that you can import pandas without trouble.
If, after you add a Python file to your repl, you can also change the .replit hidden file to make it run that file automatically on startup. Note that on NixOS, you should use #!/usr/bin/env python as your shebang, as PATH lookups are essential.

Related

Pandas Installed For All Python Versions But Module Can't Be Found

I am trying to modify an AI for a game on the steam store. The AI communicates through the game with the use of a mod called the communication mod. The AI is made using a python project. The package I am trying to modify is https://github.com/ForgottenArbiter/spirecomm and the mod is https://github.com/ForgottenArbiter/CommunicationMod.
I want to add the pandas package and the job lib package as imports so I can use a model I have made for the AI. When I try to run the game + mod after adding the pandas and joblib packages as imports I get this error in the error log.
Traceback (most recent call last):
File "/Users/ross/downloads/spirecomm-master/main.py", line 6, in <module>
from spirecomm.ai.agent import SimpleAgent
File "/Users/ross/Downloads/spirecomm-master/spirecomm/ai/agent.py", line 10, in <module>
import pandas as pd
ModuleNotFoundError: No module named 'pandas'
This issue only happens when the game is running and the mod tries to run. if I just run the file in terminal it is able to compile/run and send the ready signal
I have checked that I have these modules installed and it is installed. I am on an M1 Mac and I have several different versions of python installed but I have checked them all and it is installed for each of them. I have also opened the python package using pycharm and added pandas and joblib to the python interpreter as a package.
Another thing I have tried is modifying the setup.py file to say that pandas and joblib are required. I then ran it again but I am not sure if this had any effect because I have already run it before.
There is limited help that can be provided without knowing the framework that you are is using but hopefully this will give you some starting points to help.
If you are getting a "No module named 'pandas'" error, it is because you have imported pandas in your code but your python interpreter cannot find it. There are two major reasons this will happen, either it is not installed (which you say has definitely happened) or it is not in the location the interpreter expects (most likely).
The first thing you can do is make sure the pandas install is in the PYTHONPATH. To do this look at Permanently add a directory to PYTHONPATH?.
Secondly, you say you have several versions of python and have installed the module for all versions but you most likely have several instances of at least one version. Many IDEs, such as PyCharm, create a virtual environment when you create a new project and place in it a new instance of python interpreter, or more accurately a link to one version of python. Within this virtual environment, the IDE then loads the modules it has been told to load and thus makes them available for import to the code using that particular environment.
In your case I suspect you may be using a virtual environment that has not had pandas loaded into it. You need to investigate your IDEs documentation if you do not know how to load it. Alternatively you can instruct the IDE to use a different virtual environment (one that does have pandas loaded) - again search documentation for how to do this.
Finally, if all else fails, you can specifically tell your code where to look for the module using the sys.path.append command in your code.
import sys
sys.path.append('/your/pandas/module/path')
import pandas

How can I resolve Python module import problems stemming from the failed import of NumPy C-extensions for running Spark/Python code on a MacBook Pro?

When I try to run the (simplified/illustrative) Spark/Python script shown below in the Mac Terminal (Bash), errors occur if imports are used for numpy, pandas, or pyspark.ml. The sample Python code shown here runs well when using the 'Section 1' imports listed below (when they include from pyspark.sql import SparkSession), but fails when any of the 'Section 2' imports are used. The full error message is shown below; part of it reads: '..._multiarray_umath.cpython-310-darwin.so' (mach-o file, but is an incompatible architecture (have 'arm64', need 'x86_64'). Apparently, there was a problem importing NumPy 'c-extensions' to some of the computing nodes. Is there a way to resolve the error so a variety of pyspark.ml and other imports will function normally? [Spoiler alert: It turns out there is! See the solution below!]
The problem could stem from one or more potential causes, I believe: (1) improper setting of the environment variables (e.g., PATH), (2) an incorrect SparkSession setting in the code, (3) an omitted but necessary Python module import, (4) improper integration of related downloads (in this case, Spark 3.2.1 (spark-3.2.1-bin-hadoop2.7), Scala (2.12.15), Java (1.8.0_321), sbt (1.6.2), Python 3.10.1, and NumPy 1.22.2) in the local development environment (a 2021 MacBook Pro (Apple M1 Max) running macOS Monterey version 12.2.1), or (5) perhaps a hardware/software incompatibility.
Please note that the existing combination of code (in more complex forms), plus software and hardware runs fine to import and process data and display Spark dataframes, etc., using Terminal--as long as the imports are restricted to basic versions of pyspark.sql. Other imports seem to cause problems, and probably shouldn't.
The sample code (a simple but working program only intended to illustrate the problem):
# Example code to illustrate an issue when using locally-installed versions
# of Spark 3.2.1 (spark-3.2.1-bin-hadoop2.7), Scala (2.12.15),
# Java (1.8.0_321), sbt (1.6.2), Python 3.10.1, and NumPy 1.22.2 on a
# MacBook Pro (Apple M1 Max) running macOS Monterey version 12.2.1
# The Python code is run using 'spark-submit test.py' in Terminal
# Section 1.
# Imports that cause no errors (only the first is required):
from pyspark.sql import SparkSession
from pyspark.sql.types import *
from pyspark.sql.functions import *
# Section 2.
# Example imports that individually cause similar errors when used:
# import numpy as np
# import pandas as pd
# from pyspark.ml.feature import StringIndexer
# from pyspark.ml.feature import VectorAssembler
# from pyspark.ml.classification import RandomForestClassifier
# from pyspark.ml import *
spark = (SparkSession
.builder
.appName("test.py")
.enableHiveSupport()
.getOrCreate())
# The associated dataset is located here (but is not required to replicate the issue):
# https://github.com/databricks/LearningSparkV2/blob/master/databricks-datasets/learning-spark-v2/flights/departuredelays.csv
# Create database and managed tables
spark.sql("DROP DATABASE IF EXISTS learn_spark_db CASCADE")
spark.sql("CREATE DATABASE learn_spark_db")
spark.sql("USE learn_spark_db")
spark.sql("CREATE TABLE us_delay_flights_tbl(date STRING, delay INT, distance INT, origin STRING, destination STRING)")
# Display (print) the database
print(spark.catalog.listDatabases())
print('Completed with no errors!')
Here is the error-free output that results when only Section 1 imports are used (some details have been replaced by '...'):
MacBook-Pro ~/.../Spark2/spark-3.2.1-bin-hadoop2.7/LearningSparkGitHub/chapter4/py/src$ spark-submit test.py
[Database(name='default', description='Default Hive database', locationUri='file:/Users/.../Spark2/spark-3.2.1-bin-hadoop2.7/LearningSparkGitHub/chapter4/py/src/spark-warehouse'), Database(name='learn_spark_db', description='', locationUri='file:/Users/.../Spark2/spark-3.2.1-bin-hadoop2.7/LearningSparkGitHub/chapter4/py/src/spark-warehouse/learn_spark_db.db')]
Completed with no errors!
Here is the error that typically results when using from pyspark.ml import * or other (Section 2) imports individually:
MacBook-Pro ~/.../Spark2/spark-3.2.1-bin-hadoop2.7/LearningSparkGitHub/chapter4/py/src$ spark-submit test.py
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/numpy/core/__init__.py", line 23, in <module>
from . import multiarray
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/numpy/core/multiarray.py", line 10, in <module>
from . import overrides
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/numpy/core/overrides.py", line 6, in <module>
from numpy.core._multiarray_umath import (
ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-darwin.so, 0x0002): tried: '/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-darwin.so' (mach-o file, but is an incompatible architecture (have 'arm64', need 'x86_64')), '/usr/lib/_multiarray_umath.cpython-310-darwin.so' (no such file)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/.../Spark2/spark-3.2.1-bin-hadoop2.7/LearningSparkGitHub/chapter4/py/src/test.py", line 28, in <module>
from pyspark.ml import *
File "/Users/.../Spark2/spark-3.2.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/ml/__init__.py", line 22, in <module>
File "/Users/.../Spark2/spark-3.2.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/ml/base.py", line 25, in <module>
File "/Users/.../Spark2/spark-3.2.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/ml/param/__init__.py", line 21, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/numpy/__init__.py", line 144, in <module>
from . import core
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/numpy/core/__init__.py", line 49, in <module>
raise ImportError(msg)
ImportError:
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.
We have compiled some common reasons and troubleshooting tips at:
https://numpy.org/devdocs/user/troubleshooting-importerror.html
Please note and check the following:
* The Python version is: Python3.10 from "/Library/Frameworks/Python.framework/Versions/3.10/bin/python3"
* The NumPy version is: "1.22.2"
and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.
Original error was: dlopen(/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-darwin.so, 0x0002): tried: '/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-darwin.so' (mach-o file, but is an incompatible architecture (have 'arm64', need 'x86_64')), '/usr/lib/_multiarray_umath.cpython-310-darwin.so' (no such file)
To respond to the comment mentioned in the error message: Yes, the Python and NumPy versions noted above appear to be correct. (But it turns out the reference to Python 3.10 was misleading, as it was probably a reference to Python 3.10.1 rather than Python 3.10.2, as mentioned in Edit 1, below.)
For your reference, here are the settings currently used in the ~/.bash_profile:
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_321.jdk/Contents/Home/
export SPARK_HOME=/Users/.../Spark2/spark-3.2.1-bin-hadoop2.7
export SBT_HOME=/Users/.../Spark2/sbt
export SCALA_HOME=/Users/.../Spark2/scala-2.12.15
export PATH=$JAVA_HOME/bin:$SBT_HOME/bin:$SBT_HOME/lib:$SCALA_HOME/bin:$SCALA_HOME/lib:$PATH
export PATH=$JAVA_HOME/bin:$SPARK_HOME:$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH
export PYSPARK_PYTHON=python3
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
# export PYSPARK_DRIVER_PYTHON="jupyter"
# export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
PATH="/Library/Frameworks/Python.framework/Versions/3.10/bin:${PATH}"
export PATH
# Misc: cursor customization, MySQL
export PS1="\h \w$ "
export PATH=${PATH}:/usr/local/mysql/bin/
# Not used, but available:
# export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk-16.0.1.jdk/Contents/Home
# export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_144.jdk/Contents/Home
# export PATH=$PATH:$SPARK_HOME/bin
# For use of SDKMAN!
export SDKMAN_DIR="$HOME/.sdkman"
[[ -s "$HOME/.sdkman/bin/sdkman-init.sh" ]] && source "$HOME/.sdkman/bin/sdkman-init.sh"
The following website was helpful for loading and integrating Spark, Scala, Java, sbt, and Python (versions noted above): https://kevinvecmanis.io/python/pyspark/install/2019/05/31/Installing-Apache-Spark.html. Please note that the jupyter and notebook driver settings have been commented-out in the Bash profile because they are probably unnecessary (and because at one point, they seemed to interfere with the use of spark-submit commands in Terminal).
A review of the referenced numpy.org website did not help much:
https://numpy.org/devdocs/user/troubleshooting-importerror.html
In response to some of the comments on the numpy.org website: a Python3 shell runs fine in the Mac Terminal, and pyspark and other imports (numpy, etc.) work there normally. Here is the output that results when printing the PYTHONPATH and PATH variables from Python interactively (with a few details replaced by '...'):
>>> import os
>>> print("PYTHONPATH:", os.environ.get('PYTHONPATH'))
PYTHONPATH: /Users/.../Spark2/spark-3.2.1-bin-hadoop2.7/python/:
>>> print("PATH:", os.environ.get('PATH'))
PATH: /Users/.../.sdkman/candidates/sbt/current/bin:/Library/Frameworks/Python.framework/Versions/3.10/bin:/Library/Java/JavaVirtualMachines/jdk1.8.0_321.jdk/Contents/Home//bin:/Users/.../Spark2/spark-3.2.1-bin-hadoop2.7:/Users/.../Spark2/spark-3.2.1-bin-hadoop2.7/bin:/Users/.../Spark2/spark-3.2.1-bin-hadoop2.7/sbin:/Library/Java/JavaVirtualMachines/jdk1.8.0_321.jdk/Contents/Home//bin:/Users/.../Spark2/sbt/bin:/Users/.../Spark2/sbt/lib:/Users/.../Spark2/scala-2.12.15/bin:/Users/.../Spark2/scala-2.12.15/lib:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/MacGPG2/bin:/Library/Apple/usr/bin:/usr/local/mysql/bin/
(I am not sure which portion of this output points to a problem.)
The previously attempted remedies included these (all unsuccessful):
The use and testing of a variety of environment variables in the ~/.bash_profile
Uninstallation and reinstallation of Python and NumPy using pip3
Re-installation of Spark, Scala, Java, Python, and sbt in a (new) local dev environment
Many Internet searches on the error message, etc.
To date, no action has resolved the problem.
Edit 1
I am adding recently discovered information.
First, it appears the PATH setting mentioned above (export PYSPARK_PYTHON=python3) was pointing toward Python 3.10.1 located in /Library/Frameworks/Python.framework/Versions/3.10/bin/python3 rather than to Python 3.10.2 in my development environment. I subsequently uninstalled Python 3.10.1 and reinstalled Python 3.10.2 (python-3.10.2-macos11.pkg) on my Mac (macOS Monterey 12.2.1), but have not yet changed the PYSPARK_PYTHON path to point toward the dev environment (suggestions would be welcome on how to do that). The code still throws errors as described previously.
Second, it may help to know a little more about the architecture of the computer, since the error message pointed to a potential hardware-software incompatiblity:
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-darwin.so' (mach-o file, but is an incompatible architecture (have 'arm64', need 'x86_64')
The computer is a "MacBookPro18,2" with an Apple M1 Max chip (10 cores: 8 performance, and 2 efficiency; 32-core GPU). Some websites like these (https://en.wikipedia.org/wiki/Apple_silicon#Apple_M1_Pro_and_M1_Max, https://github.com/conda-forge/miniforge/blob/main/README.md) suggest 'Apple silicon' like the M1 Max needs software designed for the 'arm64' architecture. Using Terminal on the Mac, I checked the compatibility of Python 3.10.2 and the troublesome _multiarray_umath.cpython-310-darwin.so file. Python 3.10.2 is a 'universal binary' with 2 architectures (x86_64 and arm64), and the file is exclusively arm64:
MacBook-Pro ~$ python3 --version
Python 3.10.2
MacBook-Pro ~$ whereis python3
/usr/bin/python3
MacBook-Pro ~$ which python3
/Library/Frameworks/Python.framework/Versions/3.10/bin/python3
MacBook-Pro ~$ file /Library/Frameworks/Python.framework/Versions/3.10/bin/python3
/Library/Frameworks/Python.framework/Versions/3.10/bin/python3: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64:Mach-O 64-bit executable arm64]
/Library/Frameworks/Python.framework/Versions/3.10/bin/python3 (for architecture x86_64): Mach-O 64-bit executable x86_64
/Library/Frameworks/Python.framework/Versions/3.10/bin/python3 (for architecture arm64): Mach-O 64-bit executable arm64
MacBook-Pro ~$ file /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-darwin.so
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-darwin.so: Mach-O 64-bit bundle arm64
So I am still puzzled by the error message, which says 'x86_64' is needed for something (hardware or software?) to run this script. Do you need a special environment to run PySpark scripts on an Apple M1 Max chip? As discussed previously, PySpark seems to work fine on the same computer in Python's interactive mode:
MacBook-Pro ~$ python3
Python 3.10.2 (v3.10.2:a58ebcc701, Jan 13 2022, 14:50:16) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyspark
>>> from pyspark.sql import SparkSession
>>> from pyspark.ml import *
>>> import numpy as np
>>>
Is there a way to resolve the error so a variety of pyspark.ml and other imports will function normally in a Python script? Perhaps the settings in the ~/.bash_profile need to be changed? Would a different version of the _multiarray_umath.cpython-310-darwin.so file solve the problem, and if so, how would I obtain it? (Use a different version of Python?) I am seeking suggestions for code, settings, and/or actions. Perhaps there is an easy fix I have overlooked.
Solved it. The errors experienced while trying to import numpy c-extensions involved the challenge of ensuring each computing node had the environment it needed to execute the target script (test.py). It turns out this can be accomplished by zipping the necessary modules (in this case, only numpy) into a tarball (.tar.gz) for use in a 'spark-submit' command to execute the Python script. The approach I used involved leveraging conda-forge/miniforge to 'pack' the required dependencies into a file. (It felt like a hack, but it worked.)
The following websites were helpful for developing a solution:
Hyukjin Kwon's blog, "How to Manage Python Dependencies in PySpark" https://databricks.com/blog/2020/12/22/how-to-manage-python-dependencies-in-pyspark.html
"Python Package Management: Using Conda": https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html
Alex Ziskind's video "python environment setup on Apple Silicon | M1, M1 Pro/Max with Conda-forge": https://www.youtube.com/watch?v=2Acht_5_HTo
conda-forge/miniforge on GitHub: https://github.com/conda-forge/miniforge (for Apple chips, use the Miniforge3-MacOSX-arm64 download for OS X (arm64, Apple Silicon).
Steps for implementing a solution:
Install conda-forge/miniforge on your computer (in my case, a MacBook Pro with Apple silicon), following Alex's recommendations. You do not yet need to activate any conda environment on your computer. During installation, I recommend these settings:
Do you wish the installer to initialize Miniforge3
by running conda init? [yes|no] >>> choose 'yes'
If you'd prefer that conda's base environment not be activated on startup,
set the auto_activate_base parameter to false:
conda config --set auto_activate_base false # Set to 'false' for now
After you have conda installed, cd into the directory that contains your Python (PySpark) script (i.e., the file you want to run--in the case discussed here, 'test.py').
Enter the commands recommended in the Spark documentation (see URL above) for "Using Conda." Include in the first line (shown below) a space-separated sequence of the modules you need (in this case, only numpy, since the problem involved the failure to import numpy c-extensions). This will create the tarball you need, pyspark_conda_env.tar.gz (with all of the required modules and dependencies for each computing node) in the directory where you are (the one that contains your Python script):
conda create -y -n pyspark_conda_env -c conda-forge numpy conda-pack
conda activate pyspark_conda_env
conda pack -f -o pyspark_conda_env.tar.gz
(If needed, you could insert 'pyarrow pandas numpy', etc., instead of just 'numpy' in the first line of three shown above, if you require multiple modules such as these three. Pandas appears to have a dependency on pyarrow.)
The command conda activate pyspark_conda_env (above) will activate your new environment, so now is a good time to investigate which version of Python your conda environment has, and where it exists (you only need to do this once). You will need this information to set your PYSPARK_PYTHON environment variable in your ~/.bash_profile:
(pyspark_conda_env) MacBook-Pro ~$ python --version
Python 3.10.2
(pyspark_conda_env) MacBook-Pro ~$ which python
/Users/.../miniforge3/envs/pyspark_conda_env/bin/python
If you need a different version of Python, you can instruct conda to install it (see Alex's video).
Ensure your ~/.bash_profile (or similar profile) includes the following setting (filling-in the exact path you just discovered):
export PYSPARK_PYTHON=/Users/.../miniforge3/envs/pyspark_conda_env/bin/python
Remember to 'source' any changes to your profile (e.g., source ~/.bash_profile) or simply restart your Terminal so the changes take effect.
Use a command similar to this to run your target script (assuming you are in the same directory discussed above). The Python script should now execute successfully, with no errors:
spark-submit --archives pyspark_conda_env.tar.gz test.py
There are several other ways to use the tarball to ensure it is automatically unpacked on the Spark executors (nodes) to run your script. See the Spark documentation discussed above, if needed, to learn about them.
For clarity, here are the final ~/.bash_profile settings that worked for this installation, which included the ability to run Scala scripts in Spark. If you are not using Scala, the SBT_HOME and SCALA_HOME settings may not apply to you. Also, you may or may not need the PYTHONPATH setting. How to tailor it to your specific version of py4j is discussed in 'How to install PySpark locally' (https://sigdelta.com/blog/how-to-install-pyspark-locally/).
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_321.jdk/Contents/Home
export SPARK_HOME=/Users/.../Spark2/spark-3.2.1-bin-hadoop2.7
export SBT_HOME=/Users/.../Spark2/sbt
export SCALA_HOME=/Users/.../Spark2/scala-2.12.15
export PATH=$JAVA_HOME/bin:$SBT_HOME/bin:$SBT_HOME/lib:$SCALA_HOME/bin:$SCALA_HOME/lib:$PATH
export PATH=$JAVA_HOME/bin:$SPARK_HOME:$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH
export PYSPARK_PYTHON=/Users/.../miniforge3/envs/pyspark_conda_env/bin/python
PYTHONPATH=$SPARK_HOME$\python:$SPARK_HOME$\python\lib\py4j-0.10.9.3-src.zip:$PYTHONPATH
PATH="/Library/Frameworks/Python.framework/Versions/3.10/bin:${PATH}"
export PATH
# export PYSPARK_DRIVER_PYTHON="jupyter" # Not required
# export PYSPARK_DRIVER_PYTHON_OPTS="notebook" # Not required
If you have suggestions on how to improve these settings, please comment below.
Other notes:
Your Python script should still include the other imports you need (in my case, there was no need to include a numpy import in the script itself--only numpy in the tarball). So your script might include these, for example:
from pyspark.sql import SparkSession
from pyspark.sql.types import *
from pyspark.sql.functions import *
(etc.)
My script did not require this code snippet, which was shown in an example in the Spark documentation:
if __name__ == "__main__":
main(SparkSession.builder.getOrCreate())
The approach of simply creating a 'requirements.txt' file containing a list of modules (to zip and use in a spark-submit command without using conda) as discussed in this thread I can't seem to get --py-files on Spark to work did not work in my case:
pip3 install -t dependencies -r requirements.txt
zip -r dep.zip dependencies # Possibly incorrect...
zip -r dep.zip . # Correct if run from within folder containing requirements.txt
spark-submit --py-files dep.zip test.py
See 'PySpark dependencies' by Daniel Corin for more details on this approach, which clearly works in certain cases:
https://blog.danielcorin.com/posts/2015-11-09-pyspark/
I'm speculating a bit, but I think this approach may not allow packages built as 'wheels', so not all the dependencies you need will be built. The Spark documentation discusses this concept under "Using PySpark Native Features." (Feel free to test it out...you will not need conda-forge/miniforge to do so.)

using power bi platform for python visualization

wanted to use python for importing, arranging the data and introducing it (mathplotlib),
most of the data is over time, and I wanted to provide the visuality to nontechnical users + options to filter the data dynamically, I thought about using the python on power bi platform
wanted to advise if I'm going on the right path with this?
and the second thing - I got stocked right in the beginning, I set everything in PBI as mention here :
https://learn.microsoft.com/en-us/power-bi/desktop-python-ide
(I do use the latest anaconda as my only environment)
but getting errors even for the following basic script :
import pandas as pd
basedf = pd.read_csv("L:\TECH\Reports\BOMf\temp\baseAll.csv")
the error is :
unable to connect
Details: "ADO.NET: Python script error.
C:\Users\eran_r\Anaconda3\lib\site-packages\numpy__init__.py:140: UserWarning: mkl-service package failed to import, therefore Intel(R) MKL initialization ensuring its correct out-of-the box operation under condition when Gnu OpenMP had already been loaded by Python process is not assured. Please install mkl-service package, see http://github.com/IntelPython/mkl-service
from . import _distributor_init
Traceback (most recent call last):
File "PythonScriptWrapper.PY", line 2, in
import os, pandas, matplotlib
File "C:\Users\eran_r\Anaconda3\lib\site-packages\pandas__init__.py", line 17, in
"Unable to import required dependencies:\n" + "\n".join(missing_dependencies)
ImportError: Unable to import required dependencies:
numpy:
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
Importing the numpy c-extensions failed.
- Try uninstalling and reinstalling numpy.
- If you have already done that, then:
1. Check that you expected to use Python3.7 from "C:\Users\eran_r\Anaconda3\python.exe",
and that you have no directories in your PATH or PYTHONPATH that can
interfere with the Python and numpy version "1.18.1" you're trying to use.
2. If (1) looks fine, you can open a new issue at
https://github.com/numpy/numpy/issues. Please include details on:
- how you installed Python
- how you installed numpy
- your operating system
- whether or not you have multiple versions of Python installed
- if you built from source, your compiler versions and ideally a build log
If you're working with a numpy git repository, try git clean -xdf
(removes all files not under version control) and rebuild numpy.
Note: this error has many possible causes, so please don't comment on
an existing issue about this - open a new one instead.
Original error was: DLL load failed: The specified module could not be found.
"

Why doesn't import work for me? - Python

Whenever I try to import a file into python, it comes up with this error(or similar):
Traceback (most recent call last):
File "C:/Python33/My Files/username save.py", line 1, in <module>
import keyring.py
ImportError: No module named 'keyring'
I am trying to create a password storing program, and I was looking up for good ways to keep passwords secure, and someone said use import keyring, so I did, except, it never works. I must be doing something wrong, but whenever I look anything up for python, it never works out for me. It's almost as if loads have things have been changed over the years.
and idea's?
The keyring module is not part of the Python standard library. You need to install it first. Installation instructions are included.
Once installed, use import keyring, not import keyring.py; the latter means import the py module from the keyring package. Python imports should use just the name of the module, so not the filename with extension. Python can import code from more than just .py python files.
I was getting the same error "ModuleNotFoundError: No module named 'keyring'". And after installing this module pip install keyring, the same error occured with another module name. Then I came to the conclusion that it is the fact that the VSCode is not able to connect to my venv, even after setting the Python Itereptor. Press CTRL + SHIFT + P, and then type Python: Select Interceptor, and select your venv, if you want to set this.
To fix the issue, I had to force the VSCode to use the .venv I created, and luckily there is a dropdown to do that, on the top right corner as in the preceeding image. Click ont the Python version, and then you will be able to select your virtual environment.
Now it will take the modules from your virtual environment.

Trouble importing Python modules on Ninja IDE

I have been trying to import modules into Ninja IDE for python. These are modules that I have working on the terminal (numpy, scipy, scitools, matplotlib, and mpl_toolkits), but will not run correctly in Ninja.
First I was only getting the message No module named ____. I checked sys.path and found that the path was within the application
/Applications/Ninja IDE.app/Contents/Resources/lib/python2.7 was a typical path. I tried changing the path,but it doesn't seem to do anything to sys.path even after restarting the ide.
But I wanted the path to refer to where the modules are stored (which is /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages). I was able to get numpy and scipy to work as well as parts of mpl_toolkits by adding the contents of my path to the folders that sys.path gave. However, I still can't get fully functioning modules within the ninja ide interpreter. I'll give some examples below of what happens when I import certain modules.
import matplotlib.pyplot
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/Applications/Ninja IDE.app/Contents/Resources/lib/python2.7/matplotlib/__init__.py", line 106, in <module>
ImportError: No module named sysconfig
import mpl_toolkits
from mpl_toolkits.mplot3d import axes3d
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/Applications/Ninja IDE.app/Contents/Resources/lib/python2.7/mpl_toolkits/mplot3d/__init__.py", line 1, in <module>
File "/Applications/Ninja IDE.app/Contents/Resources/lib/python2.7/mpl_toolkits/mplot3d/axes3d.py", line 14, in <module>
File "/Applications/Ninja IDE.app/Contents/Resources/lib/python2.7/matplotlib/__init__.py", line 106, in <module>
ImportError: No module named sysconfig
Thanks for the help. I apologize, I am very new to programming, but I did put in about a day and a half of research before posting here.
That's strange as the sysconfig module is a part of Python 2.7 standard library.
Are you sure that Ninja is using the right Python version? Try running:
import sys
print sys.version_info
from Ninja, to see which Python version it is actually using.
I know this question is a few months old, but I wanted to post my solution in case others find it useful. I had a very similar problem, and had a lot of trouble finding a quick workable solution anywhere.
My somewhat roundabout solution was to simply create a virtualenv folder with the version of numpy I wanted, and then pointed the "virtualenv" property for NinjaIDE project to that folder. I restarted NinjaIDE and boom, instantly worked.
To set the virtualenv property for your project via the GUI, go to the Project menu:
Project > Open Project Properties > Project Execution,
and you should see a variable called "Virtualenv Folder". Point that to the folder for your virtualenv, and it should work. (May need to restart NinjaIDE.) This worked for me, NinjaIDE version 2.2 under Ubuntu 12.04.
One quick note: I actually didn't use virtualenv exactly -- I had to use a "conda env," since I am using the Anaconda distribution, and apparently it is not well-tested with virtualenv yet. (I actually got a warning when I went to easy_install virtualenv. Hadn't seen that before.)
Either way, this stackoverflow question has some nice pointers to virtualenv tutorials: Comprehensive beginner's virtualenv tutorial?
Good luck!
I was having a similar problem trying to import a module from /home/paul/lib/python using the console of the Ninja-IDE. I found out that /home/paul/lib/python didn't appear in syspath when checking in the console of the Ninja-IDE. But it did in the terminal!
By starting the Ninja-IDE from the terminal, /home/paul/lib/python was in syspath when checking in the console of the Ninja-IDE. I was now able to import the module I needed.
I hope this might be of some help. If not to ebris1 than maybe to others.

Categories