Install Python pycryptodome libraries for Apache Spark in Azure Synapse - python

I am tying to install pycryptodome to azure synapse notebook. PFB details.
scenario - I have created a notebook and Apache spark pool in azure synapse. I used the Below command to list the packages installed on pool. I don't see my required packages in the list. So I tried to install it using requirement.txt file and requirement.yml file in package section of Apache spark pool.
steps performed-
pip list : command to see the packages already installed.
created below file and uploaded in package section of Apache Spark pool.
requirement.txt:
pycryptodome==3.16.0
requirement.yml code:
name: pycrypto_lib
channels:
-defaults
dependencies:
-pip:
-pycryptodome
error= PFA screenshot
please share your suggestion. Thanks!

If you want to install pycryptodome. Follow the below steps:
Go to Manage -> Go to Apache Spark pool -> select Packages -> upload package_file.txt -> apply.
Note: inside package_file.txt make sure to mention your package name:
pycryptodome==3.16.0
I tested on my environment successfully installed pycryptodome==3.16.0
To verify package version. Follow below code:
import pkg_resources
for d in pkg_resources.working_set:
print(d)
You can check Successfully package installed:

Related

Uploading Google API python library to Lambda

I have a Jupyter notebook that i've built a script in for extracting data from a Google Sheet using these two imports:
from googleapiclient.discovery import build
from google.oauth import service_account
I'm trying to copy it to AWS Lambda and I'm having trouble uploading these three libraries to a layer:
google-api-python-client
google-auth-httplib2
google-auth-oauthlib
I downloaded them from pypi.org. They all only have one download option and don't specify which version of python 3 they're compatible with, except google-api-python-client which has "Python 3.7, 3.8, 3.9, 3.10 and 3.11 are fully supported and tested." in the comments.
I just checked and it looks like my Jupyter notebook is running Python 3.10. I've also copied the script into VSCode and these libraries also appear to only work in Python 3.10. Which is weird since at least one of them should still work in all versions.
It makes me think i'm doing something wrong.
Also, it doesn't look like Lambda supports 3.10? So is there no way to run Google libraries on it? Or do I need to use older libraries?
If you don't have 3.9 locally, you can use Docker to run it inside a container and see which packages you need.
FROM amazon/aws-lambda-python:3.9
RUN pip install google-api-python-client google-auth-httplib2 google-auth-oauthlib
Build it:
docker build . --progress=plain
See logs:
#5 24.65 Successfully installed cachetools-5.3.0 certifi-2022.12.7 charset-normalizer-3.0.1 google-api-core-2.11.0
google-api-python-client-2.77.0 google-auth-2.16.0
google-auth-httplib2-0.1.0 google-auth-oauthlib-1.0.0
googleapis-common-protos-1.58.0 httplib2-0.21.0 idna-3.4
oauthlib-3.2.2 protobuf-4.21.12 pyasn1-0.4.8 pyasn1-modules-0.2.8
pyparsing-3.0.9 requests-2.28.2 requests-oauthlib-1.3.1 rsa-4.9
six-1.16.0 uritemplate-4.1.1 urllib3-1.26.14
So your requirements.txt for Python 3.9 will look like:
google-api-python-client==2.77.0
google-auth-httplib2==0.1.0
google-auth-oauthlib==1.0.0
I recommend you work locally using the same version of Python and its packages. Docker is great for that!

Flask App with Azure AD Example on Windows 10

I try the example on: https://github.com/Azure-Samples/ms-identity-python-webapp with Windows 10, but I get an error with ModuleNotFoundError: No module named 'flask_caching.backends.filesystem' (Flask-Caching is already installed with pip).
Version:
Python 3.9.9,
Flask 1.1.4 and
Werkzeug 1.0.1.
I only changed the code with Client_ID, CLient_Secret and domain name in app_config.py.
Has anybody an idea?
The error ModuleNotFoundError means python interpreter cannot find the libraries which you are referring to in the code although the module is already installed.
Common causes of this error:
Using modules meant for different python versions but Installing python 2.x modules in python 3.x and vice a versa.
When not properly setting PATH variable.
(Or)
If you are using a python virtual environment. It need to be installed after creating a virtual environment as commented by #grumpyp . The libraries will reside inside the folder created for the virtual environment.
And can installed according to requirements.txt file
pip install virtualenv
It requires activation and dedicated installation of modules inside the virtual environment.
Refer this blog for more details to do
pip install -r requirements.txt
Other reference :Set Up a Virtual Python Environment (Windows)
(or )
This may not be your query but Just to make it a bit easy You can try this way when trying out your sample project to compare with manually configured one.
The quick start: "Add sign-in with Microsoft to a Python web app" that you are using ,can be directly configured in portal quickstart like below where every thing is configured including client id ,tenant id etc directly.
Just register the app with name and account type and follow the steps below for direct configuration .
Go to quickstart page of app
Select Python as platform for web application
Just follow the steps to configure azure ad inside app directly
There after following the steps , I checked the versions with pip freeze and
versions i have: Python 3.9.7, Flask 1.1.4 and Werkzeug 1.0.1.
quickstart-v2-python-webapp | microsoftdocs

How to install gdal on databricks cluster?

I am trying to install the package GDAL on an Azure Databricks cluster. In no way I can get it to work.
Approaches that I've tried but didn't work:
Via the library tab of the corresponding cluster --> Install New --> PyPi (under Library Source) --> Entered gdal under Package
Tried all approaches mentioned on https://forums.databricks.com/questions/13738/gdal-installation.html. None of them worked.
Details:
Runtime: 6.1 (includes Apache Spark 2.4.4, Scala 2.11) (When using runtime 3.5 I got GDAL to work, however an update to a higher runtime was necessary for other reasons.)
We're using python 3.7.
Finally we got it working by using an ML runtime in combination with the answer given in forums.databricks.com/answers/21118/view.html. Apparently the ML-runtimes contain conda, which is needed for the answer given in the previous link.
I have already replied similar type of question.
Please check the below link would help you to install the required library:
How can I download GeoMesa on Azure Databricks?
For your convenience I am pasting the Answer again... just you need to choose your required library from the search area.
You can install GDAL Library directly into your Databricks cluster.
1) Select the Libraries option then a new window will open.
2) Select the maven option and click on 'search packages' option
3) Search the required library and select the library/jar version and choose the 'select' option.
Thats it.
After the installation of the library/jar, restart your cluster. Now import the required classes in your Databricks notebook.
I hope it helps. Happy Coding..
pip install https://manthey.github.io/large_image_wheels/GDAL-3.1.0-cp38-cp38-manylinux2010_x86_64.whl
Looks like you are able to use this whl file and install the package but when running tasks like GDAL.Translate it will not actually run. This is the farthest I've gotten.
The above URL was found when I was searching for the binaries that GDAL needs. As a note you will have to run this every time you start your cluster.

psutil library installation issue on databricks

I am using psutil library on my databricks cluster which was running fine for last couple of weeks. When I started the cluster today, this specific library failed to install. I noticed there was a different version of psutil got updated in the site.
Currently my python script fails with 'No module psutil'
Tried installing previous version of psutil using pip install but still my code fails with the same error.
Is there any alternative to psutil or is there a way to install it in databricks
As I known, there are two ways to install a Python package in Azure Databricks cluster, as below.
As the two figures below, move to the Libraries tab of your cluster and click the Install New button to type the package name of you want to install, then wait to install successfully
Open a notebook, type the shell command as below to install a Python package via pip. Note: At here, for installing in the current environment of databricks cluster, not in the system environment of Linux, you must use /databricks/python/bin/pip, not only pip.
%sh
/databricks/python/bin/pip install psutil
Finally, I run the code below, it works for the two ways above.
import psutil
for proc in psutil.process_iter(attrs=['pid', 'name']):
print(proc.info)
psutil.pid_exists(<a pid number in the printed list above>)
In additional to #Peter response, you can also use "Library utilities" to install Python libraries.
Library utilities allow you to install Python libraries and create an environment scoped to a notebook session. The libraries are available both on the driver and on the executors, so you can reference them in UDFs. This enables:
Library dependencies of a notebook to be organized within the
notebook itself.
Notebook users with different library dependencies
to share a cluster without interference.
Example: To install "psutil" library using library utilities:
dbutils.library.installPyPI("psutil")
**Reference: **Databricks - library utilities
Hope this helps.

Firebase on AWS Lambda Import Error

I am trying to connect Firebase with an AWS Lambda. I am using their firebase-admin sdk. I have installed and created the dependancy package as described here. But I am getting this error on Lambda:
Unable to import module 'index':
Failed to import the Cloud Firestore library for Python.
Make sure to install the "google-cloud-firestore" module.
I have previously also tried setting up a similar function using node.js but I received an error message because GRPC was not configured. I think that this error message might be stemming from that same problem. I don't know how to fix this. I have tried:
pip install grpcio -t path/to/...
and installing google-cloud-firestore, but neither fixed the problem. When I run the code from my terminal, I get no errors.
Part of the problem here is that grpcio compiles a platform specific dynamic module: cygrpc.cpython-37m-darwin.so (in my case). According to this response you cannot import dynamic modules in a zip file: https://stackoverflow.com/a/58140801
Updating to python 3.8 fix this for me
As Alex DeBrie mentioned in his article on serverless.com,
The plugins section registers the plugin with the Framework. In the custom section, we tell the plugin to use Docker when installing packages with pip. It will use a Docker container that's similar to the Lambda environment so the compiled extensions will be compatible. You will need Docker installed for this to work.
Which means, the environment is different between Local and Lambda, so the compiled extensions would differ. If use a container to contain packages installed by pip, the container would mimic the environment of Lambda, then everything would run well.
If you use Serverless Frame work to deploy your Python app to AWS Lambda, add these lines to serverless.yml file:
...
plugins:
- serverless-python-requirements
...
custom:
pythonRequirements:
dockerizePip: non-linux
dockerImage: mlupin/docker-lambda:python3.9-build
...
then serverless-python-requirements would automatically open a Docker container based on mlupin/docker-lambda:python3.9-build image.
This container would mimic the Lamda environment, let pip install and compile everything in it. So the compiled extensions will be compatible.
This worked in my case. Hope this helps.

Categories