Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
class sss(webapp.RequestHandler):
def get(self):
url = "http://www.google.com/"
result = urlfetch.fetch(url)
if result.status_code == 200:
self.response.out.write(result.content)
When I change code to this:
if result.status_code == 200:
self.response.out.write(result.content.decode('utf-8').encode('gb2312'))
It shows something strange. What should I do?
When I use this:
self.response.out.write(result.content.decode('big5'))
The page is different with the one I saw Google.com.
How to get Google.com that I saw?
Google is probably serving you ISO-8859-1. At least, that is what they serve me for the User-Agent "AppEngine-Google; (+http://code.google.com/appengine)" (which urlfetch uses). The Content-Type header value is:
text/html; charset=ISO-8859-1
So you would use:
result.content.decode('ISO-8859-1')
If you check result.headers["Content-Type"], your code can adapt to changes on the other end. You can generally pass the charset (ISO-8859-1 in this case) directly to the Python decode method.
how to get google.com that i saw ?
It's probably using relative URLs to images, javascript, CSS, etc, that you're not changing into absolute URLs into google's site. To confirm this: your logs should be showing 404 errors ("page not found") as the browser to which you're serving "just the HTML" tries locating the relative-addressed resources that you're not supplying.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I want to get the Server header of HTTP with Python. I don't know how can I get it, I looked at the RFC of Hypertext Transfer Protocol and got into the Server section.
The Server response-header field contains information about the
software used by the origin server to handle the request. The field
can contain multiple product tokens (section 3.8) and comments
identifying the server and any significant subproducts. The product
tokens are listed in order of their significance for identifying the
application.
How can we get it ? I can guess that with os or platform, etc.
I am assuming you want to:
Send HTTP requests to a web server and retrieve the 'Server' header
from the HTTP response.
You want to use python.
'requests' is a very popular lib to make HTTP requests (https://requests.readthedocs.io/en/master/)
Here is a code sample very may guide you achieving what you need
import requests
response = requests.get("http://example.com")
print(response.headers['Server'])
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I want to get a html code value from many urls for the same domain and as example
the html code is the name
and the domain is facebook
and the urls is just like
https://www.facebook.com/mohamed.nazem2
so if you opened that url you will see the name is Mohamed Nazem
at shown by the code :
Mohamed Nazem (ناظِم)
as so that facebook url
https://www.facebook.com/zuck
Mark Zuckerberg
so the value at the first url was >Mohamed Nazem<
and the second url it's Mark Zuckerberg
hopefully you got what i thinking in..
To fetch the HTML page for each url you will need to use something like the requests library. To install it, use pip install requests and then in your code use it like so:
import requests
response = requests.get('https://facebook.com/zuck')
print(response.data)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
For example, I tried getting Python to read the following filtered page
http://www.hearthpwn.com/cards?filter-attack-val=1&filter-attack-op=1&display=1
but Python only gets the unfiltered page http://www.hearthpwn.com/cards instead.
The standard library urllib2 normally follows redirects. If retrieving this URL used to work without being redirected, then the site has changed.
Although you can prevent following the redirect within urllib2 (by providing an alternative HTTP handler), I recommend using requests, where you can do:
import requests
r = requests.get('http://www.hearthpwn.com/cards?filter-attack-val=1'
'&filter-attack-op=1&display=1', allow_redirects=False)
print(r)
giving you:
<Response [302]>
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I am using Django 1.9 to build a link shortener. I have created a simple HTML page where the user can enter the long URL. I have also coded the methods for shortening this URL. The data is getting stored in the database and I am able to display the shortened URL to the user.
I want to know what I have to do next. What happens when a user visits the shorter URL? Should I use redirects or something else? I am totally clueless about this topic.
Normally when you provide a url shortner, after calling the url, you have to redirect to main url by 301 Permanently moved.
def resolve_url(request,url):
origin_url=resolve(url) # read from redis or so.
return HttpResponseRedirect(origin_url)
EDIT:
add code using #danny-cullen hint
You could just navigate to the URL via HttpResponseRedirect
Write a middleware instead of writing same code in every view, such that, if the shortened url is in the model that you stored the you can redirect the shortened url to the long url using HttpResponseRedirect.
class RedirectMiddleware(object):
# Check if client IP is allowed
def process_request(self, request):
'''you can get the current url from request and just filter with the model and redirect to longurl with HttpResponseRedirect.'''
return HttpResponseRedirect(full_url)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
If I want to give input to the website through python program and display the result on terminal after online computation then how can I do it using python wrapper? As I am new to python, so Can anyone suggest me some tutorial for this?
It all depends on the website that you want to retrieve your result from, and how it accepts input from you. For example, if your webpage accepts GET or POST requests, then you can send it a HTTP request and print out the response onto terminal.
If your website accepts input via a submit form, on the other hand, you would have to find the link of the submit button and send your data to that page.
There is a Python library called Requests, which you can use to send HTTP requests to a webpage and get the response. I suggest you read its documentation, it has some good examples that you can base your idea off. Another library is the inbuilt urllib2, which would also work for your purposes.
The response to your request is most likely to be a HTTP webpage, so you may have to scrape out your desired content from inside that.