How do I prevent a python script from being redirected from a specific web page? [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
For example, I tried getting Python to read the following filtered page
http://www.hearthpwn.com/cards?filter-attack-val=1&filter-attack-op=1&display=1
but Python only gets the unfiltered page http://www.hearthpwn.com/cards instead.

The standard library urllib2 normally follows redirects. If retrieving this URL used to work without being redirected, then the site has changed.
Although you can prevent following the redirect within urllib2 (by providing an alternative HTTP handler), I recommend using requests, where you can do:
import requests
r = requests.get('http://www.hearthpwn.com/cards?filter-attack-val=1'
'&filter-attack-op=1&display=1', allow_redirects=False)
print(r)
giving you:
<Response [302]>

Related

Extract full HTML of a website by using pyppeteer in python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last year.
Improve this question
I'm using the below code to extract full HTML:
cont = await page1.content()
The website I intend to extract from is:
https://www.mohmal.com/en
which is a website to make temporary email accounts. The exact thing I want to do is reading the content of received emails, but by using the above code, I could not extract inner frame HTML where received emails contents placed within it. How can I do so?
Did you try using urllib?
You can use the urllib module to read html websites.
from urllib.request import urlopen
f = urlopen("https://www.google.com")
print(f.read())
f.close()

How to get final url with python? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I get a link with requests.get and when I check history it's empty although the link redirects to another address when I open it with my browser. What is the problem?
import requests
r=requests.get('http://dir.iran.ir/home?p_p_id=webdirectorydisplay_WAR_webdirectoryportlet&p_p_lifecycle=0&p_p_state=exclusive&p_p_mode=view&_webdirectorydisplay_WAR_webdirectoryportlet_itemEntryId=14439&_webdirectorydisplay_WAR_webdirectoryportlet_cmd=redirectToLink')
result=r.history
but result equal with empty list
and final link is http://www.dps.ir/
You should check the result of that URL first.
>>> r.content
'<script type="text/javascript">window.location.href="http://www.dps.ir";</script> '
Requests library doesn't provide ability to execute Javascript, so that explains why there is no history.
PS: Btw you could give phantomjs a shot.

extracting html code from urls list [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I want to get a html code value from many urls for the same domain and as example
the html code is the name
and the domain is facebook
and the urls is just like
https://www.facebook.com/mohamed.nazem2
so if you opened that url you will see the name is Mohamed Nazem
at shown by the code :
‏‎Mohamed Nazem‎‏ ‏(ناظِم)‏
as so that facebook url
https://www.facebook.com/zuck
Mark Zuckerberg
so the value at the first url was >Mohamed Nazem<
and the second url it's Mark Zuckerberg
hopefully you got what i thinking in..
To fetch the HTML page for each url you will need to use something like the requests library. To install it, use pip install requests and then in your code use it like so:
import requests
response = requests.get('https://facebook.com/zuck')
print(response.data)

How to wait for the page to load before scraping it? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I want to extract the HTML from a webpage:
import urllib2
req = urllib2.Request('https://www.example.com')
response = urllib2.urlopen(req)
fullhtml = response.read()
I tried with "ulrllib2" but since the page is built dynamically, the HTML content is empty.
Is there a way to wait for the javascript to load?
Take a look at this http://phantomjs.org/ . Most websites are javascript based and php or python can not execute them. I think this library will be the best you can get.

how to implement request GET in Python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am having difficulties with the implementation of the query GET.
For example, I took no difficult task to bring in a convenient form the information on this page GET https://www.bitstamp.net/api/transactions/
Used API https://www.bitstamp.net/api/
I'm interested in everything from syntax to the modules you want to install for this request
Did you already seen this?
http://docs.python-requests.org/en/latest/
e.g:
import requests
response = requests.get(url)
print response.json()
If you don't want to install an extra library, you can use pythons urllib2 library that is just as easy for something like connecting to a url.
import urllib2
print urllib2.urlopen("https://www.bitstamp.net/api/transactions/").read()
For parsing that, use pythons json library.
There is a Python lib for that, the bitstamp-python-client. Do not waste your time reinventing the wheel. ;-)

Categories