Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I get a link with requests.get and when I check history it's empty although the link redirects to another address when I open it with my browser. What is the problem?
import requests
r=requests.get('http://dir.iran.ir/home?p_p_id=webdirectorydisplay_WAR_webdirectoryportlet&p_p_lifecycle=0&p_p_state=exclusive&p_p_mode=view&_webdirectorydisplay_WAR_webdirectoryportlet_itemEntryId=14439&_webdirectorydisplay_WAR_webdirectoryportlet_cmd=redirectToLink')
result=r.history
but result equal with empty list
and final link is http://www.dps.ir/
You should check the result of that URL first.
>>> r.content
'<script type="text/javascript">window.location.href="http://www.dps.ir";</script> '
Requests library doesn't provide ability to execute Javascript, so that explains why there is no history.
PS: Btw you could give phantomjs a shot.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I'm using beautiful soup to scrape a webpages.
I am trying to scrape data from this https://painel-covid19.saude.ma.gov.br/vacinas. But the problem is I am getting the tags in outputs empty. In the Inspect Element I can see the data, but in page source not. You can see the code is hidden in . How can I retrieve it using python? Someone can help me?
The issue isn't "not visible". The issue is that the data is being filled in by Javascript code. You won't see the data unless you are executing the Javascript on the page. You can do that with the selenium package, which runs a copy of Chrome to do the rendering.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am currently working on a python project and I would like to know how to serve a txt file to a browser. Ex: Accessing page.html will make a get request for a file with the same name to download. No FE processing is needed for the received file. Thanks!
You have to change the MIME type of the content. Take a look at MDN docs provided below.
the list of current content types accepted
One option that i would recommend it's application/octet-stream.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
For example, I tried getting Python to read the following filtered page
http://www.hearthpwn.com/cards?filter-attack-val=1&filter-attack-op=1&display=1
but Python only gets the unfiltered page http://www.hearthpwn.com/cards instead.
The standard library urllib2 normally follows redirects. If retrieving this URL used to work without being redirected, then the site has changed.
Although you can prevent following the redirect within urllib2 (by providing an alternative HTTP handler), I recommend using requests, where you can do:
import requests
r = requests.get('http://www.hearthpwn.com/cards?filter-attack-val=1'
'&filter-attack-op=1&display=1', allow_redirects=False)
print(r)
giving you:
<Response [302]>
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I want to extract the HTML from a webpage:
import urllib2
req = urllib2.Request('https://www.example.com')
response = urllib2.urlopen(req)
fullhtml = response.read()
I tried with "ulrllib2" but since the page is built dynamically, the HTML content is empty.
Is there a way to wait for the javascript to load?
Take a look at this http://phantomjs.org/ . Most websites are javascript based and php or python can not execute them. I think this library will be the best you can get.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
So let's say I have this URL: https://www.python.org/
and I want to download the page's source into a .txt file named python_source.txt
how would I do that?
Use urllib2, Here's how it's done:
response = urllib2.urlopen(url)
content = response.read()
Now you can save the content in any text file.
The python package urllib does just this. The documentation gives a very clear example on what you want to do.
import urllib.request
local_filename, headers = urllib.request.urlretrieve('http://python.org/')
html = open(local_filename)