lxml not being recognized in bs4: Python 3 on mac - python

So, I have a mac on High Sierra and I am trying to import and use an api. This is api is a python3 api and uses bs4, and specifically is using lxml within bs4 in order to parse a webpage.
However, I am having an issue getting bs4 to recognize that I have lxml installed on my machine. I have installed both of them using pip, and both appear to have installed correctly. I can write a program with ‘import bs4’ and ‘import lxml’ at the top and it compiles and runs perfectly fine. However, no matter what I do I always get the following error when I run a program using this api.
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
On top of this, when I run the following code
import lxml
import bs4
print(bs4.builder.builder_registry.builders)
the output is
[<class 'bs4.builder._htmlparser.HTMLParserTreeBuilder'>]
With no lxml listed.
I have tried everything I have found on the various stack overflow threads related to this. I have uninstalled and reinstalled both lxml and bs4 through various methods(pip, easy install, manually installing, homebrew). I've manually linked lxml from brew. And other things ive probably forgotten. However I cant get it to work.
Anyone have any ideas/has anyone gone through this before. Its possible I'm completely missing something small or stupid, since I've never messed with bs4 before, but I dont know.

I'm not exactly sure what's the cause, but I had a similar case with a Flask app I'm working on. I worked around it by importing bs4 locally in the function where I needed it.
One symptom I had was when I logged bs4.builder.builder_registry.builders at the top of my module, the logs ended up showing two entries: first with the proper builders and then with only HTML.

Related

Import statements not working / I think I broke Python / file structure issues?

[I am new to Python (and programming in general) and will definitely say something stupid in this question.]
I had two python programs. In one of them the import statements were working. And in the other one the import statements were not working.
I suspected this had something to do with the file location of the modules relative to the Python files.
It turned out the program that wasn't working was in a sub folder of the program that was working.
So, as an experiment, I tried moving the venv folder into the sub folder where the other program was, but I ended up canceling that once I discovered that I would need to replace some of the files. (Due to the fact that is already had a venv folder.)
Then, as an experiment, I tried renaming the venv folder to "venv1" just to see if the good program would run. I was not surprised when it didn't.
But then I renamed it back to "venv," and it still wasn't working.
from bs4 import BeautifulSoup
import requests
import json, requests
import urllib.request
import bs4 as bs
import urllib
# .... etc ...
output
ModuleNotFoundError: No module named 'bs4'
...
...
...
oh, and if I try:
#from bs4 import BeautifulSoup
import requests
import json, requests
import urllib.request
import bs4 as bs
import urllib
# .... etc ...
Output:
ModuleNotFoundError: No module named 'requests'
I tried pip installing them again (my terminal doesn't recognize sudo pip install) and this is what I got
PS C:\Users\****\Desktop> pip install requests
Requirement already satisfied: requests in c:\users\****\appdata\local\programs\python\python310\lib\site-packages (2.27.1)
I thought maybe I'd look this one up, but the folder "appdata" doesn't exist on my computer, in that location.
What happened and how can I fix it?
The appdata folder should exist in that location. It is a hidden folder, and by default, Windows won't display hidden files/folders. You can view it by pressing WIN+R, and then typing "appdata", and clicking "OK". It should then come up in a file explorer window.
The python packages are installed, but not visible to the scripts. It sounds like you virtual environment may be incorrectly set up. If you open a CMD prompt, and then type in python -m site, it will show you the locations of your python's system path. You should see the install locations for the packages, in this case, you'll probably see the following: C:\\Users\\****\\AppData\\Local\\Programs\\Python\\Python310\\lib\\site-packages.

No module named sublime

I'm trying to work on some old code that used a library I'm not very familiar with anymore, and am getting the 'No module named' error. I'm using sublime text editor on windows. I had edited some things in build to run c++ previously if that might effect it.
If the library in particular matters, it is youtube_dl
I have tried $pip install -upgrade youtube_dl but it hasn't changed anything.
The code that is throwing an error is import youtube_dl
I searched around on the internet, and I found an answer.
(Edited for clarity)
”Sublime is a module only available in Sublime Text embedded Python interpreters.
import sublime will only work from a plugin or inside the console, not on your system Python interpreter.”
The forum post is here.
Not sure if this will help, as I do not know if you are trying to import sublime, but this is literally all the info I could find.
Hopefully this helps!

Python BS4 error: No module named html.entities

So I installed BS4 with PIP, but when I do
from bs4 import beautifulsoup4
I get this
ImportError: No module named html.entities
I've done all the research I can into this as it seems to be a fairly rare problem, and the only thing I could find is to only have one version of python installed, yet I need both python two and three installed so that wouldn't work.
So I was wondering if there's a better way to fix this error, I have read http://www.crummy.com/software/BeautifulSoup/bs4/doc/#problems-after-installation by the way, there appeared to be nothing in there that would help my situation.
Thanks for your help! And sorry for my formatting

Urllb2 package missing from Pypi

I wanted to install urllib2 package from PyPI but it is not available.
It seems that it has been updated to urllib3, but is there any way to download urllib2 ?
import urllib2
Is that what you want?
If you find any library under http://docs.python.org/ you can always import without installing it.
Update 1:
If you need the source code...
The official Cpython code: http://hg.python.org/cpython/file/3b5fdb5bc597/Lib/urllib
Note The urllib2 module has been split across several modules in
Python 3 named urllib.request and urllib.error. The 2to3 tool will
automatically adapt imports when converting your sources to Python 3.
or try this? http://code.reddit.com/docs/urllib2-pysrc.html
I can't guarantee the integrity for the second alternative link.

Missing lxml module in python?

I want o use Python-docx library to process word files. A docx.py references lxml, as i assume from
from lxml import etree
When i start the script, i get error:
No module named lxml
Is this a standard library? Why is not it referenced properly then? IronPython 2.7 RC1.
You need to install lxml which is not part of the stdlib. I don't know if it will work with IronPython though.
Update: Seems like it might be non-trivial to get lxml working with IronPython. See this question:
How to get lxml working under IronPython?

Categories