Unable to download scrapyd on ec2

Unable to download scrapyd on ec2 - python

sudo apt-get update && sudo apt-get install scrapyd
Err:7 http://archive.scrapy.org/ubuntu precise InRelease
Could not connect to archive.scrapy.org:80 (31.125.20.14), connection timed out
Err:8 http://archive.scrapy.org/ubuntu scrapy InRelease
Unable to connect to archive.scrapy.org:http:
Reading package lists... Done
N: Ignoring file 'scrapy.listecho' in directory '/etc/apt/sources.list.d/' as it has an invalid filename extension
W: Failed to fetch http://archive.scrapy.org/ubuntu/dists/precise/InRelease Could not connect to archive.scrapy.org:80 (31.125.20.14), connection timed out
W: Failed to fetch http://archive.scrapy.org/ubuntu/dists/scrapy/InRelease Unable to connect to archive.scrapy.org:http:
W: Some index files failed to download. They have been ignored, or old ones used instead.
enter code here
I installed scrapy without any error, now I want to install scrapyd but it gives this error
I checked Can't install Scrapyd on EC2,but no answer was there

Related

How to install or read Firefox binary for use with Selenium on Databricks?

I'm trying to use Selenium with Firefox on Databricks but can't get it to work, and I haven't come across any others with quite the same issues as me. The problem seems to lie with the Firefox installation.
My first approach was to install Selenium, geckodriver and Firefox within the notebook. The first two installed fine, but the Firefox installation showed some network connection issues. Here's what I ran:
Install Selenium
%pip install selenium
Install geckodriver
%sh
wget https://github.com/mozilla/geckodriver/releases/download/v0.31.0/geckodriver-v0.31.0-linux64.tar.gz -O /tmp/geckodriver.tar.gz
%sh
tar -xvzf /tmp/geckodriver.tar.gz -C /tmp
Install Firefox
%sh
/usr/bin/yes | sudo apt-get update --fix-missing
which gives the error
Ign:1 https://apt.datadoghq.com stable InRelease
Err:2 https://apt.datadoghq.com stable Release Could not handshake: Error in the pull function. [IP: xx.xxx.xx.xx xxx]
Err:3 http://security.ubuntu.com/ubuntu focal-security InRelease Connection failed [IP: xxx.xxx.xxx.xx xx]
Hit:4 https://repos.azul.com/zulu/deb stable InRelease
Err:5 http://archive.ubuntu.com/ubuntu focal InRelease Connection failed [IP: xx.xxx.xx.xx xxx]
Err:6 http://archive.ubuntu.com/ubuntu focal-updates InRelease Connection failed [IP: xxx.xxx.xxx.xx xx]
Err:7 http://archive.ubuntu.com/ubuntu focal-backports InRelease Connection failed [IP: xx.xxx.xx.xx xxx]
I tried skipping the apt-get update command and running
%sh
sudo apt-get --yes --force-yes install firefox
but it resulted in similar errors.
I later discovered that our databricks environment is set up in such a way that we can't send anything outside of our network, so this approach of installing Firefox will not work.
My second approach was to download and install Firefox to my local machine and upload the firefox.exe file to the dbfs. I installed selenium and the geckodriver as above and ran the following code.
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
options = Options()
options.headless = True
options.binary_location = r'/dbfs/path/to/firefox/firefox.exe'
driver = webdriver.Firefox(options=options, executable_path = '/path/to/driver/geckodriver')
which gave the error
driver = webdriver.Firefox(options=options, executable_path = '/path/to/driver/geckodriver')
InvalidArgumentException: Message: binary is not a Firefox executable
It seems that for some reason, it can't read the firefox.exe file that I've uploaded to dbfs.
Any suggestions as to possible fixes for this? Or any alternative ways to get Selenium working with Firefox (taking the network restrictions in to account)?

The error message from apt package manager shows it couldnt connect to pkg repository. Which indicates that you dont have access grated to that site. Might be some whitelisting issue, or firewall conn request item...
Look up "headless firefox setup" you might also need that to able to run firefox withough X windows system...
hope this helps, Zoltan

Poetry: Max retries exceeded with url in gitlab runner

I have a simple .gitlab-ci.yml script that builds my python project.
image: python:3.9.6-slim-buster
variables:
PIP_DEFAULT_TIMEOUT: 300
before_script:
- pip install poetry==1.1.7
- poetry config virtualenvs.create false
- poetry install
When I run the CI pipeline, I periodically get such errors and the job is interrupted with a failure.
First type of error:
...
• Installing toml (0.10.2)
• Installing uvloop (0.16.0)
• Installing watchgod (0.8.2)
• Installing websockets (10.3)
ConnectionError
HTTPSConnectionPool(host='pypi.org', port=443): Max retries exceeded with url: /pypi/flake8-eradicate/1.2.1/json (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fa5c5625dc0>: Failed to establish a new connection: [Errno -2] Name or service not known'))
at /usr/local/lib/python3.9/site-packages/requests/adapters.py:565 in send
561│ if isinstance(e.reason, _SSLError):
562│ # This branch is for urllib3 v1.22 and later.
563│ raise SSLError(e, request=request)
564│
→ 565│ raise ConnectionError(e, request=request)
566│
567│ except ClosedPoolError as e:
568│ raise ConnectionError(e, request=request)
569│
Cleaning up project directory and file based variables
00:01
ERROR: Job failed: exit code 1
Second type of error:
...
• Installing gitpython (3.1.27)
OSError
Could not find a suitable TLS CA certificate bundle, invalid path: /usr/local/lib/python3.9/site-packages/certifi/cacert.pem
at /usr/local/lib/python3.9/site-packages/requests/adapters.py:263 in cert_verify
259│ if not cert_loc:
260│ cert_loc = extract_zipped_paths(DEFAULT_CA_BUNDLE_PATH)
261│
262│ if not cert_loc or not os.path.exists(cert_loc):
→ 263│ raise OSError(
264│ f"Could not find a suitable TLS CA certificate bundle, "
265│ f"invalid path: {cert_loc}"
266│ )
267│
Cleaning up project directory and file based variables
00:00
ERROR: Job failed: exit code 1
What is most interesting is that these errors are triggered on completely different libraries at different times. I have to do a retry many times so that it installs everything stably. What could be the problem and how to solve it?
For info: I use dockerized gitlab runner with docker executor on CentOS 7

I think this has to do more with poetry.
Could be more likely due to parallel installations.
You can either check restrict multiple installers but this will slow down the installations.
poetry config installer.max-workers=1
Had seen a discussion in poetry where they said this is less likely to happen in 1.2 but since 1.2 isn't release you can use 1.2.0b1 and check
Check this out for more details https://github.com/python-poetry/poetry/issues/3336.

Regarding the
Could not find a suitable TLS CA certificate bundle, invalid path: .../site-packages/certifi/cacert.pem
This is probably caused because poetry install is acting in the very environment where poetry is installed, causing it to uninstall its own dependencies because they aren't listed in the lockfile. This is explicitly warned in CI recommendations in the installation guide:
https://python-poetry.org/docs/#ci-recommendations
If you install Poetry via pip, ensure you have Poetry installed into an isolated environment that is not the same as the target environment managed by Poetry. If Poetry and your project are installed into the same environment, Poetry is likely to upgrade or uninstall its own dependencies (causing hard-to-debug and understand errors).
You would notice because the logs from the install instruction start not creating a virtualenv and follow removing packages that are about to break the installation:
Installing dependencies from lock file
Package operations: 58 installs, 0 updates, 2 removals
• Removing certifi (2022.6.15)
• Removing setuptools (65.3.0)
Solution:
poetry config virtualenvs.create true
and be careful using poetry inside tox (tox creates venv, tox installs poetry, poetry removes certifi)
Regarding the
Max retries exceeded with url: ... Failed to establish a new connection: [Errno -2] Name or service not known
I think this deserves a totally different Q&A, but this might simply be some PyPI downtime you unluckily observed.

Apache Tika Server Failed to receive startup confirmation from startServer

I am trying to use Tika in python to extract text from the pdf files. I have Java 8 installed on my system. Despite that, I am unable to convert these pdf's into text file. Below is the code that I am using:
file_name = file.split('\\')[-1]
path = "C:/Users/user_name/PDF_Files/"+file_name
raw = parser.from_file(path)
name = path.split('/')[-1][:-4]
print(name)
file_name = "C:/Users/user_name/PDF_Files/"+name+".txt"
text_file = open(file_name,"w",encoding="utf-8")
if raw['content'] is not None:
text_file = open(file_name,"w",encoding="utf-8")
text_file.write(raw['content'])
text_file.close()
for file in glob.glob("C:/Users/user_name/PDF_Files/*.pdf"):
gettext(file)
Below is the error message that I am getting after running the above code:
Error Message
2019-11-07 15:09:06,062 [MainThread ] [ERROR] Unable to run java; is it installed?
2019-11-07 15:09:06,062 [MainThread ] [ERROR] Failed to receive startup confirmation from startServer.

Faced the same issue, After some trial, I resolved the issue as
For ubuntu,In the terminal
java --version
1.Path not correct -> configure it.
2.Old java version -> update it.
3. Java not available -> install it
I followed below for installation
sudo apt update
sudo apt install default-jdk # Confirm the installation by typing y (yes) and press Enter.
sudo apt update
sudo apt install default-jre
sudo apt install software-properties-common
sudo add-apt-repository ppa:linuxuprising/java
sudo apt update
sudo apt install oracle-java11-installer
Finally check again,
java --version
Now head on to check tika again!!

Unable to install box2d

I run this command:
apt-get install python-box2d
and I get the following output:
E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied)
E: Unable to lock the administration directory (/var/lib/dpkg/), are you root?

Always prefix sudo before executing any installation related command no matter if it's pip or aptik, it tells the terminal that you have the authority to make this change (installation).
For example:
sudo apt-get update
It's for updating your repository , and so it needs to be authorized to get executed.
Also sometimes this error occurs when some other installation is in progress and it has locked the main directory , So you just have to wait for that process or installation to get completed.

Install anaconda library from a local source

I have been trying to install pymc for some time on a Windows PC behind a very complicated proxy; effectively making this an installation on a computer not connected to the internet. I have tried - unsuccessfully - to set a proxy in the condarc file but I still get error messages
conda install -c https://conda.binstar.org/pymc pymc
Fetching package metadata: SSL verification error: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:645)

To solve this, you need to download the tar file (even if using Windows) that the installer is trying to fetch and then use offline mode:
Use this command (which will throw an error), to determine the file to be downloaded:
>conda install -c https://conda.binstar.org/pymc pymc
Fetching package metadata: SSL verification error: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:645)
....
Solving package specifications: ......................
The following packages will be downloaded:
package | build
---------------------------|-----------------
pymc-2.3.5 | np110py35_0 402 KB defaults
The following NEW packages will be INSTALLED:
pymc: 2.3.5-np110py35_0 defaults
Proceed ([y]/n)? y
Fetching packages ...
Could not connect to https://repo.continuum.io/pkgs/free/win-64/pymc-2.3.5-np110py35_0.tar.bz2
... [error message continues]...
Now download the tar file mentioned in the error message:
https://repo.continuum.io/pkgs/free/win-64/pymc-2.3.5-np110py35_0.tar.bz2
And then run this command with the path to the tar file:
>conda install --offline C:\pymc-2.3.5-np110py35_0.tar.bz2

Just of notes: "-c conda-forge" might be handy for some other packages. For example:
conda install -c conda-forge python-levenshtein

Adding to the solution above. Anyone met "SSL verification error" can just turn the verification step off temporarily in ~/.condarc
channels:
- defaults
ssl_verify: true
In this way, you could install from your local hub.

I was not able to run offline installation in Anaconda (on Win10) because Anaconda always tried to connect to remote channels or collecting metadata process failed.
To solve this I had to:
Download requested module as bz2 file (or convert tar.gz into tar.bz2)
Run Anaconda prompt (ev. navigate to folder with bz2 file)
Run offline installation with conda install path-to-bz2, i.e conda install zeep-4.0.0.tar.bz2

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unable to download scrapyd on ec2 - python

Related

How to install or read Firefox binary for use with Selenium on Databricks?

Poetry: Max retries exceeded with url in gitlab runner

Apache Tika Server Failed to receive startup confirmation from startServer

Unable to install box2d

Install anaconda library from a local source

Categories

Resources