scrapyd-deploy error: pkg_resources.DistributionNotFound - python

I have been trying for a long time to find a solution to the scrapyd error message: pkg_resources.DistributionNotFound: The 'idna<3,>=2.5' distribution was not found and is required by requests
What I have done:
$ docker pull ceroic/scrapyd
$ docker build -t scrapyd .
Dockerfile:
FROM ceroic/scrapyd
RUN pip install "idna==2.5"
$ docker build -t scrapyd .
Sending build context to Docker daemon 119.3kB
Step 1/2 : FROM ceroic/scrapyd
---> 868dca3c4d94
Step 2/2 : RUN pip install "idna==2.5"
---> Running in c0b6f6f73cf1
Downloading/unpacking idna==2.5
Installing collected packages: idna
Successfully installed idna
Cleaning up...
Removing intermediate container c0b6f6f73cf1
---> 849200286b7a
Successfully built 849200286b7a
Successfully tagged scrapyd:latest
I run the container:
$ docker run -d -p 6800:6800 scrapyd
Next:
scrapyd-deploy demo -p tutorial
And get error:
pkg_resources.DistributionNotFound: The 'idna<3,>=2.5' distribution was not found and is required by requests
I'm not a Docker expert, and I don't understand the logic. If idna==2.5 has been successfully installed inside the container, why does the error message require version 'idna<3,>=2.5'?

The answer is very simple. I finished my 3 days! torment. When I run the
scrapyd-deploy demo -p tutorial
then I do it not in the created container, but outside it.
The problem was solved by:
pip uninstall idna
pip install "idna == 2.5"
This was to be done on a virtual server, not a container. I can't believe I didn't understand it right away.

Related

Torch installment fails in Elastic Beanstalk web application

I am trying to deploy a basic web application using Elastic Beanstalk from AWS.
My app is written in python and uses pyTorch library so it can import NLP model named "bart-cnn-large" (with it I can produce text summarization).
I have a file named requirements.txt and with it the EC2 sets the virtual environment.
However, it always fails when trying to install the pytorch library.
If I remove "torch" from the requirements.txt then the installation doesn't fail anymore.
but I get this message:
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models
won't be available and only tokenizers, configuration and file/data
utilities can be used.
If I leave "torch" in requirements I get this message:
2021/08/06 10:54:23.688955 [ERROR] An error occurred during execution
of command [app-deploy] - [InstallDependency]. Stop running the
command. Error: fail to install dependencies with requirements.txt
file with error Command /bin/sh -c
/var/app/venv/staging-LQM1lest/bin/pip install -r requirements.txt
failed with error exit status 1. Stderr:ERROR: Invalid requirement:
'torch==1.9.0+cu102 torchvision==0.10.0+cu102 torchaudio===0.9.0'
(from line 1 of requirements.txt)
the requirments.txt content:
>torch==1.9.0+cu102 torchvision==0.10.0+cu102 torchaudio===0.9.0 -f
https://download.pytorch.org/whl/torch_stable.html
Flask~=2.0.1
Werkzeug~=2.0.1
tika~=1.24
beautifulsoup4~=4.8.2
docx2txt~=0.8
transformers~=4.8.2
clean-text
I tried several versions of "pip torch" but none seems to work.
is this a storage problem? why won't it install?

docker build error when trying to pip install dict package

I only started getting the error when I moved my docker installation onto ubuntu (things were working fine on windows docker installation).
When I run docker build I get the following error when it is trying to install python package dict:
#14 1.681 Downloading dict-2020.7.1.tar.gz (1.8 kB)
#14 2.134 ERROR: Requested dict from https://files.pythonhosted.org/packages/67/fb/6a2458c82f59b4aad53949776608d97a46483c403df1dc20c39b413efe10/dict-2020.7.1.tar.gz#sha256=b54864077239b94e33376650824185c5aa310d3bf5089da57769f68413b6a83f has different version in metadata: '0.0.0'
(*if I remove the dict package from the requirements.txt file the docker build works fine, but my application fails to run in docker as it can't find the dict package)
when I look at the version of the dict package on my machine it shows version 0.0.0 even though the latest version is 2020.7.1?
Any suggestions on how to fix the error?
You should add --use-feature=2020-resolver to the pip installation command. According this issue: github.com/pypa/pip/issues/8707

Requirements error while trying to deploy to Scrapy Cloud

I'm trying to deploy my spider to Scrapy Cloud using shub but I keep running into this following error:
$ shub deploy
Packing version 2df64a0-master
Deploying to Scrapy Cloud project "164526"
Deploy log last 30 lines:
---> Using cache
---> 55d64858a2f3
Step 11 : RUN mkdir /app/python && chown nobody:nogroup /app/python
---> Using cache
---> 2ae4ff90489a
Step 12 : RUN sudo -u nobody -E PYTHONUSERBASE=$PYTHONUSERBASE pip install --user --no-cache-dir -r /app/requirements.txt
---> Using cache
---> 51f233d54a01
Step 13 : COPY *.egg /app/
---> e2aa1fc31f89
Removing intermediate container 5f0a6cb53597
Step 14 : RUN if [ -d "/app/addons_eggs" ]; then rm -f /app/*.dash-addon.egg; fi
---> Running in 3a2b2bbc1a73
---> af8905101e32
Removing intermediate container 3a2b2bbc1a73
Step 15 : ENV PATH /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
---> Running in ccffea3009a4
---> b4882513b76e
Removing intermediate container ccffea3009a4
Successfully built b4882513b76e
>>> Checking python dependencies
scrapinghub 1.9.0 has requirement six>=1.10.0, but you have six 1.7.3.
monkeylearn 0.3.5 has requirement requests>=2.8.1, but you have requests 2.3.0.
monkeylearn 0.3.5 has requirement six>=1.10.0, but you have six 1.7.3.
hubstorage 0.23.6 has requirement six>=1.10.0, but you have six 1.7.3.
Warning: Pip checks failed, please fix the conflicts.
Process terminated with exit code 1, signal None, status=0x0100
{"message": "Dependencies check exit code: 193", "details": "Pip checks failed, please fix the conflicts", "error": "requirements_error"}
{"message": "Requirements error", "status": "error"}
Deploy log location: /var/folders/w0/5w7rddxn28l2ywk5m6jwp7380000gn/T/shub_deploy_xi_w3xx8.log
Error: Deploy failed: b'{"message": "Requirements error", "status": "error"}'
It looks like a simple problem of an outdated package (six). However the installed package actually IS up to date:
$ pip show six
Name: six
Version: 1.10.0
Summary: Python 2 and 3 compatibility utilities
Home-page: http://pypi.python.org/pypi/six/
Author: Benjamin Peterson
Author-email: benjamin#python.org
License: MIT
Location: /Users/mac/.pyenv/versions/3.6.0/lib/python3.6/site-packages
Requires:
I'm running python 3.6 through pyenv on a Mac.
Any ideas?
EDIT:
my requirements.txt file only contains the following dependency:
newspaper==0.0.9.8
EDIT 2: scrapinghub.yml
projects:
default: 164526
requirements_file: requirements.txt
Thanks,
Simon!
Managed to solve this (with help from scrapinghub's support forum) by adding the following code to scrapinghub.yml:
stacks:
default: scrapy:1.3-py3
and changing requirements.txt to use the python3 branch of newspaper:
newspaper3k==0.1.9

M2Crypto installation fails on Amazon Beanstalk

I am trying to install python package "M2Crypto" via requirements.txt and I receive the following error message:
/usr/include/openssl/opensslconf.h:36: Error: CPP #error ""This openssl-devel package does not work your architecture?"". Use the -cpperraswarn option to continue swig processing.
error: command 'swig' failed with exit status 1
I tried passing
option_name: SWIG_FEATURES
value: "-cpperraswarn -includeall -I/usr/include/openssl"
But the error persists. Any idea?
The following config file (placed in .ebextensions) works for me:
packages:
yum:
swig: []
container_commands:
01_m2crypto:
command: 'SWIG_FEATURES="-cpperraswarn -includeall -D`uname -m` -I/usr/include/openssl" pip install M2Crypto==0.21.1'
Make sure you don't specify M2Crypto in your requirements.txt though, Elastic Beanstalk will try to install all dependencies before running the container commands.
I have found a solution that gets M2Crypto installed on Beanstalk but it is a bit of hack and it is your responsibility to make sure that it is good for a production environment. I dropped M2Crypto from my project because this issue is ridiculous, try pycrypto if you can.
Based on (I only added python setup.py test):
#!/bin/bash
python -c "import M2Crypto" 2> /dev/null
if [ "$?" == 1 ]
then
cd /tmp/
pip install -d . --use-mirrors M2Crypto==0.21.1
tar xvfz M2Crypto-0.21.1.tar.gz
cd M2Crypto-0.21.1
./fedora_setup.sh build
./fedora_setup.sh install
python setup.py test
fi`
In the environment config file
commands:
m2crypto:
command: scripts/m2crypto.sh
ignoreErrors: True
test: echo '! python -c "import M2Crypto"' | bash
ignoreErrors is NOT a good idea but I just used it to test if the package actually gets installed and seems like it.
Again, this might seem to get the package installed but I am not sure because removing ignoreErrors causes failure. Therefore, I won't mark this as the accepted answer but it was way too much to be a comment.

Trouble installing private github repository using pip

To preface, I have already seen this question Is it possible to use pip to install a package from a private github repository?
I am trying to install a package from a private repository that I have access to using pip.
I am able to directly clone it like so:
(myenv)robbie#ubuntu:~/git$ git clone git#github.com:matherbk/django-messages.git
Cloning into 'django-messages'...
remote: Counting objects: 913, done.
remote: Compressing objects: 100% (345/345), done.
remote: Total 913 (delta 504), reused 913 (delta 504)
Receiving objects: 100% (913/913), 165.73 KiB, done.
Resolving deltas: 100% (504/504), done.
But when I try to install it via pip (my virtualenv is activated):
(myenv)robbie#ubuntu:~/git$ pip install git+https://git#github.com/matherbk/django-messages.gitDownloading/unpacking git+https://git#github.com/matherbk/django-messages.git
Cloning https://git#github.com/matherbk/django-messages.git to /tmp/pip-13ushS-build
Password for 'https://git#github.com':
fatal: Authentication failed
Complete output from command /usr/bin/git clone -q https://git#github.com/matherbk/django-messages.git /tmp/pip-13ushS-build:
----------------------------------------
Command /usr/bin/git clone -q https://git#github.com/matherbk/django-messages.git /tmp/pip-13ushS-build failed with error code 128 in None
Storing complete log in /home/robbie/.pip/pip.log
I tried typing in my password but it failed. However I am ssh authenticated for git#github.com:
(myenv)robbie#ubuntu:~/git$ ssh -T git#github.com
Hi robpodosek! You've successfully authenticated, but GitHub does not provide shell access.
I can switch git#github.com to robpodosek#github.com and it lets me install via pip just fine:
(myenv)robbie#ubuntu:~/git$ pip install git+https://robpodosek#github.com/matherbk/django-messages.git
Downloading/unpacking git+https://robpodosek#github.com/matherbk/django-messages.git
Cloning https://robpodosek#github.com/matherbk/django-messages.git to /tmp/pip-SqEan9-build
Password for 'https://robpodosek#github.com':
Running setup.py egg_info for package from git+https://robpodosek#github.com/matherbk/django-messages.git
warning: no files found matching 'README'
Installing collected packages: django-messages
Running setup.py install for django-messages
warning: no files found matching 'README'
Successfully installed django-messages
Cleaning up...
However I want to do what the first mentioned article does by using git#github.com so that I don't have to add my username into a requirements.txt file and add that to version control.
Any thoughts? I previously had this working but had to boot up a fresh image. Thanks ahead of time.
It worked by using oxyum's suggestion of changing the : to a /:
pip install git+ssh://git#github.com/matherbk/django-messages.git
Make sure you use github.com/account instead of github.com:account
see Git+SSH dependencies have subtle (yet critical) differences from git clone
Had virtualenv activated and had to install a series of applications from github.com from a text file.
(venv)$ cat requirements.txt
-e git://github.com/boto/botocore.git#develop#egg=botocore
-e git://github.com/boto/jmespath.git#develop#egg=jmespath
-e git://github.com/boto/s3transfer.git#develop#egg=s3transfer
nose==1.3.3
mock==1.3.0
wheel==0.24.0
unittest2==0.5.1; python_version == '2.6'
(venv)$ pip install -r requirements.txt
Ignoring unittest2: markers 'python_version == "2.6"' don't match your environment Obtaining botocore from git+git://github.com/boto/botocore.git#develop#egg=botocore (from -r requirements.txt (line 1))
Cloning git://github.com/boto/botocore.git (to develop) to ./venv/src/botocore
fatal: unable to connect to github.com:
github.com[0: 192.30.253.112]: errno=Connection timed out
github.com[1: 192.30.253.113]: errno=Connection timed out
Command "git clone -q git://github.com/boto/botocore.git
/home/ubuntu/utils/boto3/venv/src/botocore" failed with error code 128 in None
However, as #Robeezy suggested, edited the requirement.txt and changed from
-e git://github.com...
to
-e git+https://github.com...
That is the link provided if you were to clone from the site (only options were Clone or Download).
So, thank you! It did work finally.
If you're installing with pip install git+https://github.com/repo and getting this error, make sure your username and password are correct. I was getting this error because I was incorrectly entering my password.

Categories