Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I'm using beautiful soup to scrape a webpages.
I am trying to scrape data from this https://painel-covid19.saude.ma.gov.br/vacinas. But the problem is I am getting the tags in outputs empty. In the Inspect Element I can see the data, but in page source not. You can see the code is hidden in . How can I retrieve it using python? Someone can help me?
The issue isn't "not visible". The issue is that the data is being filled in by Javascript code. You won't see the data unless you are executing the Javascript on the page. You can do that with the selenium package, which runs a copy of Chrome to do the rendering.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 days ago.
Improve this question
Im trying to scrape data from glassdoor but my beautifulsoup cant read the data at all
So far i've tried this
import requests
from bs4 import BeautifulSoup
html_text=requests.get('https://www.glassdoor.co.in/Job/data-analyst-jobs-SRCH_KO0,12.htm?fromAge=7').text
soup1=BeautifulSoup(html_text,'lxml')
soup2=soup1.prettify()
jobs=soup1.find_all('li',class_='react-job-listing css-7x0jr eigr9kq3')
print(jobs)
Ive seen solutions using selenium but is there any other way to get the actual data? ive tried this for the 'ul' class, the 'li' class and so on but nothing seems to work
There is no li tag with attribute react-job-listing css-7x0jr eigr9kq3 in the html of that url.
look html page for what you need to scrape.
for example you can try li with atrribute react-job-listing css-7x0jr eigr9kq3 which is present in html page.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
For example when you go into chrome and open a website, then go to the devtools and open up the sources tab, how would i get basically everything that is in there myself with python? Like download it with python?
There are a few popular ways to interact with web content in Python, like controlling the browser with automation, for example with selenium. This will allow you to click and extract elements from a webpage. See this example.
An alternative would be to use a library like beautifulsoup to request the webpage and parse it within your Python script. This is usually the preferred method if you don't want the dependency of an actual browser (like in headless environments). More info in the official docs.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
Given a http/https page, I would like to search for some links on that page, anyone knows how to achieve this goal with Bash, Python or any other popular script languages?
Try this in python. It will print all tags with a link:
import requests
from bs4 import BeautifulSoup as soup
print(soup(requests.get('Your link').content).find_all('a', href=True'))
You should use Beautiful Soup. It's an html parser library in python. You'll look for <a> tags and grab the inner content.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
How can I click on the link here with using id.class.href python selenium, I using pyCharm *href is Changeable
i try
driver.find_element_by_xpath("//div[#id='result_26']//a[#class='a-link-normal s-access-detail-page s-color-twister-title-link a-text-normal']/#href").click()
enter image description here
You are clicking the href attribute instead of the <a> tag.
Assuming your xpath is correct, just remove the /#href from the end and it should ideally work.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I want to extract the HTML from a webpage:
import urllib2
req = urllib2.Request('https://www.example.com')
response = urllib2.urlopen(req)
fullhtml = response.read()
I tried with "ulrllib2" but since the page is built dynamically, the HTML content is empty.
Is there a way to wait for the javascript to load?
Take a look at this http://phantomjs.org/ . Most websites are javascript based and php or python can not execute them. I think this library will be the best you can get.