when I write findAll it says: findAll is not defined [closed] - python

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I am trying to write my first data scraping code. However, this happens whenever I try and find all the tr tags in the html:
I wrote: match = findAll("tr",{"class":"match"})
This comes up: NameError: name "findAll" is not defined.

If I am assuming right and you are using
from bs4 import BeautifulSoup
you need to understand that find_all is part of the bs4.element.Tag object
findAll might not work
obj = BeautifulSoup(html_text, 'html.parser')
obj.find_all("tr",{"class":"match"})
This should solve your problem.

Related

How to extract text between two markers? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 10 months ago.
Improve this question
I am trying to create a blogging site kind of like medium.com. The only problem I am facing is the headings in the blog. I want to do what stack overflow does with bold text.
**text**
But I can't seem to figure out how to make this. Sorry if this question is not detailed enough and Thanks for giving this question your time.
you can find string between two subStrings using python regular expression.
import re
s = '**text**'
result = re.search('\*\*(.*)\*\*', s)
print(result.group(1)) #output :==> text
you really should learn regular expression 😀.

How can I apply BeautifulSoup on a part of a HTML tag collected in a CSV? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I have a CSV where one column has strings with a full HTML table on each row.
I want to navigate those tables and extract the TDs corresponding to some definite THs. BeautifulSoup, obviously, raises an error saying it can't read strings, only HTML.
What should I do? Is Beautiful Soup really the best way?
If you want to read a HTML string and parse the data, a better option might be the etree module from lxml,
from lxml import etree
tree = etree.fromstring(your_html_string)
You can parse the tree object by passing in the xpath to the desired elements.
tds = tree.xpath('the/xpath/to/yout/tds')
If you want the text within the tds,
tds = tree.xpath('the/xpath/to/yout/tds/text()')
Here is the link to the documentation,
https://lxml.de/parsing.html

“How to fix ‘NoneType' object is not subscriptable”

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 18 hours ago.
Improve this question
I am trying to get product information from Amazon.com.
I searched for some codes on the internet and I've found one. I've tried to fix bugs in the code but I stuck with the one below. It seems that there is something wrong with the last line in the code below because I am getting this error: 'NoneType' object is not subscriptable
for asin in asin_array:
item_array=[] #An array to store details of a single product.
amazon_url="https://www.amazon.com/dp/"+asin #The general structure of a url
response = session.get(amazon_url, headers=headers, verify=False) #get the response
item_array.append(response.html.search('a-color-price">${}<')[0]) #Extracting the price
Because your response is None.
Basically, you trying this
print(None[0])
and its not possible
check your response or searching tag maybe its not available
You can check it before extract:
price_data = response.html.search('a-color-price">${}<')
if price_data:
item_array.append(price_data[0])

Selecting data from the list based on specific part of the string [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
Highly appreciate your efforts to help me!
I am trying some small scraping project of mine. I have got a list of links from webpage, and now I want to select only links that are related to products from navigation.
I was wondering is there a way to select all links from array that have "product" in their name. For example i want all the links from website navigation that have word "blog" in them.
Appreciate answer.
I suggest using the Beautiful Soup library. You could do the following assuming that the site you want to scrape is stored as html:
b = BeautifulSoup(html, 'lxml')
links = [i['href'] for i in b.find_all('a') if "blog" in i['href']]
This makes a list of the href tags of each link on the page and then filters that list for elements that contain the string blog.

Decode HTML string using python 2.7 [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I want the following HTML string to be decoded with HTML tags
\u003cp\u003e\u003cstrong\u003e\u003cspan\u003eAbout the Company \u003c/span\u003e\u003c/strong\u003e\u003c/p\u003e
How can I do that in Python 2.7 ?
I am having large HTML string to decode. The above sample is just a apart of that.
PS: I have tried with many solutions available in web to decode HTML string but nothing helps me EDIT:
I have referred this https://stackoverflow.com/a/2087433/4350834
and got the result as
\u003cp\u003e\u003cstrong\u003e\u003cspan\u003eAbout the Company \u003c/span\u003e\u003c/strong\u003e\u003c/p\u003e
You can try:
>>>text = "\u003cp\u003e\u003cstrong\u003e\u003cspan\u003eAbout the Company \u003c/span\u003e\u003c/strong\u003e\u003c/p\u003e".decode('unicode-escape')
>>>print text
u'<p><strong><span>About the Company </span></strong></p>'

Categories