psutil library installation issue on databricks - python

I am using psutil library on my databricks cluster which was running fine for last couple of weeks. When I started the cluster today, this specific library failed to install. I noticed there was a different version of psutil got updated in the site.
Currently my python script fails with 'No module psutil'
Tried installing previous version of psutil using pip install but still my code fails with the same error.
Is there any alternative to psutil or is there a way to install it in databricks

As I known, there are two ways to install a Python package in Azure Databricks cluster, as below.
As the two figures below, move to the Libraries tab of your cluster and click the Install New button to type the package name of you want to install, then wait to install successfully
Open a notebook, type the shell command as below to install a Python package via pip. Note: At here, for installing in the current environment of databricks cluster, not in the system environment of Linux, you must use /databricks/python/bin/pip, not only pip.
%sh
/databricks/python/bin/pip install psutil
Finally, I run the code below, it works for the two ways above.
import psutil
for proc in psutil.process_iter(attrs=['pid', 'name']):
print(proc.info)
psutil.pid_exists(<a pid number in the printed list above>)

In additional to #Peter response, you can also use "Library utilities" to install Python libraries.
Library utilities allow you to install Python libraries and create an environment scoped to a notebook session. The libraries are available both on the driver and on the executors, so you can reference them in UDFs. This enables:
Library dependencies of a notebook to be organized within the
notebook itself.
Notebook users with different library dependencies
to share a cluster without interference.
Example: To install "psutil" library using library utilities:
dbutils.library.installPyPI("psutil")
**Reference: **Databricks - library utilities
Hope this helps.

Related

Amazon EMR pip install in bootstrap actions runs OK but has no effect

In Amazon EMR, I am using the following script as a custom bootstrap action to install python packages. The script runs OK (checked the logs, packages installed successfully) but when I open a notebook in Jupyter Lab, I cannot import any of them. If I open a terminal in JupyterLab and run pip list or pip3 list, none of my packages is there. Even if I go to / and run find . -name mleap for instance, it does not exist.
Something I have noticed is that on the master node, I am getting all the time an error saying bootstrap action 2 has failed (there is no second action, only one). According to this, it is a rare error which I get in all my clusters. However, my cluster eventually gets created and I can use it.
My script is called aws-emr-bootstrap-actions.sh
#!/bin/bash
sudo python3 -m pip install numpy scikit-learn pandas mleap sagemaker boto3
I suspect it might have something to do with a docker image being deployed that invalidates my previous installs or something, but I think (for my Google searches) it is common to use bootstrap actions to install python packages and should work ...
The PYSPARK, Python interpreter that Spark is using, is different than the one to which the OP was installing the modules (as confirmed in comments).

How to install gdal on databricks cluster?

I am trying to install the package GDAL on an Azure Databricks cluster. In no way I can get it to work.
Approaches that I've tried but didn't work:
Via the library tab of the corresponding cluster --> Install New --> PyPi (under Library Source) --> Entered gdal under Package
Tried all approaches mentioned on https://forums.databricks.com/questions/13738/gdal-installation.html. None of them worked.
Details:
Runtime: 6.1 (includes Apache Spark 2.4.4, Scala 2.11) (When using runtime 3.5 I got GDAL to work, however an update to a higher runtime was necessary for other reasons.)
We're using python 3.7.
Finally we got it working by using an ML runtime in combination with the answer given in forums.databricks.com/answers/21118/view.html. Apparently the ML-runtimes contain conda, which is needed for the answer given in the previous link.
I have already replied similar type of question.
Please check the below link would help you to install the required library:
How can I download GeoMesa on Azure Databricks?
For your convenience I am pasting the Answer again... just you need to choose your required library from the search area.
You can install GDAL Library directly into your Databricks cluster.
1) Select the Libraries option then a new window will open.
2) Select the maven option and click on 'search packages' option
3) Search the required library and select the library/jar version and choose the 'select' option.
Thats it.
After the installation of the library/jar, restart your cluster. Now import the required classes in your Databricks notebook.
I hope it helps. Happy Coding..
pip install https://manthey.github.io/large_image_wheels/GDAL-3.1.0-cp38-cp38-manylinux2010_x86_64.whl
Looks like you are able to use this whl file and install the package but when running tasks like GDAL.Translate it will not actually run. This is the farthest I've gotten.
The above URL was found when I was searching for the binaries that GDAL needs. As a note you will have to run this every time you start your cluster.

How can I make this script run

I found this script (tutorial) on GitHub (https://github.com/amyoshino/Dash_Tutorial_Series/blob/master/ex4.py) and I am trying to run in my local machine.
Unfortunately I am having and Error
I would really appreciate if anyone can help me to run this script.
Perhaps this is something easy but I am new in coding.
Thank you!
You probably just need to pip install the dash-core-components library!
Take a look at the Dash Installation documentation. It currently recommends running these commands:
pip install dash==0.38.0 # The core dash backend
pip install dash-html-components==0.13.5 # HTML components
pip install dash-core-components==0.43.1 # Supercharged components
pip install dash-table==3.5.0 # Interactive DataTable component (new!)
pip install dash-daq==0.1.0 # DAQ components (newly open-sourced!)
For more info on using pip to install Python packages, see: Installing Packages.
If you have run those commands, and Flask still throws that error, you may be having a path/environment issue, and should provide more info in your question about your Python setup.
Also, just to give you a sense of how to interpret this error message:
It's often easiest to start at the bottom and work your way up.
Here, the bottommost message is a FileNotFound error.
The program is looking for the file in your Python37/lib/site-packages folder. That tells you it's looking for a Python package. That is the directory to which Python packages get installed when you use a tool like pip.

Module Import Error , how to refresh Ipython session?

I am a total newbie to python. I am using docker as my virtual environment.
I am trying to run this line of code on ipython
pivot_df.to_excel(os.path.expandvars('/home/user/code.xlsx'))
and I am getting the following error:
ImportError: No module named openpyxl
I installed openpyxl successfully and then tried to import openpyxl again on ipython but with no success.
Will I have to refresh my ipython session for the openpyxl to work? If yes, how do I do that? Will I lose everything I ran until now if I do that?
I don't think you need to reload as such, so import should work after you installed a package.
It may be related to some python path of virtual environment issue. Where you installed package in one python and ipython is running in another configuration.
Best thing is to execute ! pip install openpyxl from ipython itself. This will make sure you install the package in correct environment. Then it should work

Verify thread-safety MySQLdb (Python) prior to Trac installation

I'm trying to install Trac manually for the first time. I don't want to use a one-click-installer like Bitmani, I want to learn how to install Trac manually, so I'm following the instructions carefully. I'm installing it in a Windows localhost for now, before installing it in a Linux environment.
As I follow the instructions carefully, I needed to install Python+MySQLDb, and I read this:
thread-safety is important
(...) verify that it is thread-safe by calling MySQLdb.thread_safe() from a standalone Python script (i.e., not under Apache). If the stand-alone test reports that MySQLdb is indeed thread-safe (...)
I've just installed MySQLDb 1.2.4 and I'd like to verify this. I've Googled but I haven't found an example about this, and I have no idea about Python. How can I verify if I've got a thread-safe installation?
Run this command. If you get 1 in the output, your installation is threadsafe.
python -c "import MySQLdb ; print MySQLdb.thread_safe()"

Categories