I'm trying to use Python 2 to send my own HTTP GET message to a web server, retrieve html text, and write it to an html file (no urllib, urllib2, httplib, requests, etc. allowed).
import socket
tcpSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
tcpSocket.connect(('python.org', 80))
http_get = """GET / HTTP/1.1\r
Host: www.python.org/\r
Connection: keep-alive\r
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\r
Upgrade-Insecure-Requests: 1\r
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36\r
Accept-Encoding: gzip, deflate, sdch\r
Accept-Language: en-US,en;q=0.8\r\n\r\n"""
tcpSocket.send(http_get)
m = tcpSocket.recv(4096)
tcpSocket.close()
print m
Output:
HTTP/1.1 301 Moved Permanently
Location: https://www.python.org//
Connection: Keep-Alive
Content-length: 0
Why does it return 301 when the location is apparently still the same? What message and to where should I send next to get the html content?
Thank you very much!
Your problem is that the url you are seeking doesn't serve over http://, but rather redirects to https://. To show that your code fundamentally works with a proper target I have changed your get request to
import socket
tcpSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
tcpSocket.connect(('www.cnn.com', 80))
http_get = """GET / HTTP/1.1\r
Host: www.cnn.com/\r
Connection: keep-alive\r
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\r
Upgrade-Insecure-Requests: 1\r
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36\r
Accept-Encoding: gzip, deflate, sdch\r
Accept-Language: en-US,en;q=0.8\r\n\r\n"""
http_get_minimum = """GET / HTTP/1.1\r\nHost: www.cnn.com\r\nConnection: close\r\n\r\n"""
tcpSocket.send(http_get_minimum)
m = tcpSocket.recv(4096)
tcpSocket.close()
and received
HTTP/1.1 200 OK
x-servedByHost: prd-10-60-168-42.nodes.56m.dmtio.net
Cache-Control: max-age=60
X-XSS-Protection: 1; mode=block
Content-Security-Policy: default-src 'self' http://.cnn.com: https://.cnn.com: .cnn.net: .turner.com: .ugdturner.com: .vgtf.net:; script-src 'unsafe-inline' 'unsafe-eval' 'self' *; style-src 'unsafe-inline' 'self' *; frame-src 'self' *; object-src 'self' *; img-src 'self' * data:; media-src 'self' *; font-src 'self' *; connect-src 'self' *;
Content-Type: text/html; charset=utf-8
Via: 1.1 varnish
Content-Length: 74864
Accept-Ranges: bytes
Date: Mon, 05 Oct 2015 00:39:54 GMT
Via: 1.1 varnish
Age: 170
Connection: close
X-Served-By: cache-iad2144-IAD, cache-sjc3129-SJC
X-Cache: HIT, HIT
X-Cache-Hits: 2, 95
X-Timer: S1444005594.675567,VS0,VE0
Vary: Accept-Encoding
UPDATE: Yes, there is extra functionality required from what you have presented to be able to request over HTTPS. There are some primary differences between http and https, however, beginning with the default port, which is 80 for http and 443 for https. Https works by transmitting normal http interactions through an encrypted system, so that in theory, the information cannot be accessed by any party other than the client and end server. There are two common types of encryption layers: Transport Layer Security (TLS) and Secure Sockets Layer (SSL), both of which encode the data records being exchanged.
When using an https connection, the server responds to the initial connection by offering a list of encryption methods it supports. In response, the client selects a connection method, and the client and server exchange certificates to authenticate their identities. After this is done, both parties exchange the encrypted information after ensuring that both are using the same key, and the connection is closed. In order to host https connections, a server must have a public key certificate, which embeds key information with a verification of the key owner's identity. Most certificates are verified by a third party so that clients are assured that the key is secure.
I had the same problem and changing port from 80 to 443 solved it.
Related
I have a jQuery Ajax call, like so:
$("#tags").keyup(function(event) {
$.ajax({url: "/terms",
type: "POST",
contentType: "application/json",
data: JSON.stringify({"prefix": $("#tags").val() }),
dataType: "json",
success: function(response) { display_terms(response.terms); },
});
I have a Flask method like so:
#app.route("/terms", methods=["POST"])
def terms_by_prefix():
req = flask.request.json
tlist = terms.find_by_prefix(req["prefix"])
return flask.jsonify({'terms': tlist})
tcpdump shows the HTTP dialog:
POST /terms HTTP/1.1
Host: 127.0.0.1:5000
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:12.0) Gecko/20100101 Firefox/12.0
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Type: application/json; charset=UTF-8
X-Requested-With: XMLHttpRequest
Referer: http://127.0.0.1:5000/
Content-Length: 27
Pragma: no-cache
Cache-Control: no-cache
{"prefix":"foo"}
However, Flask replies without keep-alive.
HTTP/1.0 200 OK
Content-Type: application/json
Content-Length: 445
Server: Werkzeug/0.8.3 Python/2.7.2+
Date: Wed, 09 May 2012 17:55:04 GMT
{"terms": [...]}
Is it really the case that keep-alive is not implemented?
The default request_handler is WSGIRequestHandler.
Before app.run(), Add one line,
WSGIRequestHandler.protocol_version = "HTTP/1.1"
Don't forget from werkzeug.serving import WSGIRequestHandler.
Werkzeug's integrated web server builds on BaseHTTPServer from Python's standard library. BaseHTTPServer seems to support Keep-Alives if you set its HTTP protocol version to 1.1.
Werkzeug doesn't do it but if you're ready to hack into the machinery that Flask uses to instantiate Werkzeug's BaseWSGIServer, you can do it yourself. See Flask.run() which calls werkzeug.serving.run_simple(). What you have to do boils down to BaseWSGIServer.protocol_version = "HTTP/1.1".
I haven't tested the solution. I suppose you do know that Flask's web server ought to be used for development only.
There are similar questions posted, but I still seem to have a problem. I am expecting to receive a registration email after running this. I receive nothing. Two questions. What is wrong? How would I even know if the data was successfully submitted as opposed to the page just loading normally?
serviceurl = 'https://signup.com/'
payload = {'register-fname': 'Peter', 'register-lname': "Parker", 'register-email': 'xyz#email.com', 'register-password': '9dlD313kF'}
r2 = requests.post(serviceurl, data=payload)
print(r2.status_code)
The url for the POST request is actually https://signup.com/api/users, and it returns 200 (in my browser).
You need to replicate what your browser does. This might include certain request headers.
You will want to use your browser's dev tools/network inspector to gather this information.
The information below it from my Firefox on my computer:
Request headers:
Host: signup.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:73.0) Gecko/20100101 Firefox/73.0
Accept: application/json, text/plain, */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Content-Type: application/json;charset=utf-8
Content-Length: 107
Origin: https://signup.com
Connection: keep-alive
Referer: https://signup.com/
Cookie: _vspot_session_id=ce1937cf52382239112bd4b98e0f1bce; G_ENABLED_IDPS=google; _ga=GA1.2.712393353.1584425227; _gid=GA1.2.1095477818.1584425227; __utma=160565439.712393353.1584425227.1584425227.1584425227.1; __utmb=160565439.2.10.1584425227; __utmc=160565439; __utmz=160565439.1584425227.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmt=1; __qca=P0-1580853344-1584425227133; _gat=1
Pragma: no-cache
Cache-Control: no-cache
Payload:
{"status":true,"code":null,"email":"TestEmail#hotmail.com","user":{"id":20540206,"email":"TestEmail#hotmail.com","name":"TestName TestSurname","hashedpassword":"4ffdbb1c33d14ed2bd02164755c43b4ad8098be2","salt":"700264767700800.7531319164902858","accesskey":"68dd25c3ae0290be69c0b59877636a5bc5190078","isregistered":true,"activationkey":"f1a6732b237379a8a1e6c5d14e58cf4958bf2cea","isactivated":false,"chgpwd":false,"timezone":"","phonenumber":"","zipcode":"","gender":"N","age":-1,"isdeferred":false,"wasdeferred":false,"deferreddate":null,"registerdate":"2020/03/17 06:09:27 +0000","activationdate":null,"addeddate":"2020/03/17 06:09:27 +0000","admin":false,"democount":0,"demodate":null,"invitationsrequest":null,"isvalid":true,"timesinvalidated":0,"invaliddate":null,"subscribe":0,"premium":false,"contributiondate":null,"contributionamount":0,"premiumenddate":null,"promo":"","register_token":"","premiumstartdate":null,"premiumsubscrlength":0,"initial_reg_type":"","retailmenot":null,"sees":null,"created_at":"2020/03/17 06:09:27 +0000","updated_at":"2020/03/17 06:09:27 +0000","first_name":"TestName","last_name":"TestSurname"},"first_name":"TestName","last_name":"TestSurname","mobile_redirect":false}
There's a lot to replicate. Things like the hashed password, salt, dates, etc would have been generated by JavaScript executed by your browser.
Keep in mind, the website owner might not appreciate a bot creating user accounts.
I'm setting up google pub/sub on a Flask server and have successfully set the endpoint to POST to https://myapp.ngrok.io/pubsub/push/ according to the documentation.
In my console it returns this request with a 400 error:
66.102.8.237 - - [24/Oct/2019:04:30:35 +0000] "POST /pubsub/push/ HTTP/1.1" 400 148 "-" "APIs-Google; (+https://developers.google.com/webmasters/APIs-Google.html)"
I'm trying to access the message body to troubleshoot the 400 error but haven't been able to print the message body using print(request.get_json()).
Is there a way I can access the HTTP message body in Flask or is the above error the only information sent to my app?
The HTTP header data for the request is stored inside the header attribute requests. So for example, if you do,
print(request.header)
you'd get something like this on the console:
Host: localhost:5000
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:70.0) Gecko/20100101 Firefox/70.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Cookie: csrftoken=CQmXNt256FqZev0S2fRtw04ZSTlUnvYHGRbNn6NH5OVn36W7qPMZw0s9N3anGHMG
Upgrade-Insecure-Requests: 1
Cache-Control: max-age=0
I'd like to call the Webiopi REST API from my angular application in a browser running on the Raspberry. As Webiopi HTTP server doesn't allow CORS request, I have created a proxy with apache that sends the Header add "Access-Control-Allow-Origin" "*" header.
This is working fine, however the call to the REST API throws many errors mainly because the browser sends an OPTIONS request to the server in case of a CORS request to check wether it is allowed or not. But the webiopi http handler doesn't handle the OPTIONS verb at all.
So I started to write it into the code myself with zero python experience. In the file python/webiopi/protocols/http.py I have added at the end:
def do_OPTIONS(self):
self.send_response(200,"ok")
self.send_header("Access-Control-Allow-Origin", "*")
self.send_header("Access-Control-Allow-Methods", "POST,GET,OPTIONS")
self.send_header("Access-Control-Allow-Headers", "Authorization")
self.send_header("Access-Control-Allow-Headers", "Content-Type")
self.end_headers()
Now it doesn't throw any error but doesn't give me the proper response to my GET request. It just stops after the OPTIONS. The request and response looks like this:
Request headers:
OPTIONS /GPIO/1/value HTTP/1.1
Host: localhost:8000
Connection: keep-alive
Access-Control-Request-Method: GET
Origin: http://192.168.1.108:51443
User-Agent: Mozilla/5.0 (X11; Linux armv7l) AppleWebKit/537.36 (KHTML, like Gecko) Raspbian Chromium/65.0.3325.181 Chrome/65.0.3325.181 Safari/537.36
Access-Control-Request-Headers: authorization
Accept: */*
Accept-Encoding: gzip, deflate, br
Accept-Language: hu-HU,hu;q=0.9,en-US;q=0.8,en;q=0.7
Response headers:
HTTP/1.1 200 OK
Date: Fri, 23 Nov 2018 22:06:28 GMT
Server: WebIOPi/0.7.1/Python3.5
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: POST,GET,OPTIONS
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET,POST,OPTIONS
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
General (from chrome network tab):
Request URL: http://localhost:8000/GPIO/1/value
Request Method: OPTIONS
Status Code: 200 OK
Remote Address: [::1]:8000
Referrer Policy: no-referrer-when-downgrade
Where is my GET request? Why do I see only the OPTIONS which by the way I'm not initiating at all?
The request from angular:
this.http.get<number>(this.route+'GPIO/'+gpio+'/value').subscribe(result => {
resolve(result);
})
I had to enable all headers to the http server:
def do_OPTIONS(self):
self.send_response(200,"ok")
self.send_header("Access-Control-Allow-Origin", "*")
self.send_header("Access-Control-Allow-Methods", "*")
self.send_header("Access-Control-Allow-Headers", "*")
self.end_headers()
I'm creating a forum status grabber. But I want to use sockets to grab the data from the forum. So I am writing to the socket a header. But there is 400 error. So I made a test script to do checking but still I get errors.
import socket
s = socket.socket()
s.connect(("198.57.47.136", 80))
header = """
GET / HTTP/1.1\r\n
Host: httn
Connection: keep-alive\r\n
Cache-Control: max-age=0\r\n
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\r\n
User-Agent: Mozilla/5.0 (Windows NT 6.3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36 OPR/26.0.1656.60\r\n
Accept-Encoding: gzip, deflate, lzma, sdch\r\n
Accept-Language: en-GB,en-US;q=0.8,en;q=0.6\r\n
"""
s.send(header)
print s.recv(10000)
Which returns
HTTP/1.1 400 Bad Request
Server: nginx
Date: Thu, 01 Jan 2015 21:43:47 GMT
Content-Type: text/html
Content-Length: 166
Connection: close
<html>
<head><title>400 Bad Request</title></head>
<body bgcolor="white">
<center><h1>400 Bad Request</h1></center>
<hr><center>nginx</center>
</body>
</html>
A multi-line Python string adds an extra \n for every line. Note:
>>> s = '''
... Host: rile5.com\r\n
... '''
>>>
>>> s
'\nHost: rile5.com\r\n\n'
There is an extra first line and two \n for each line. This works, but not on the original IP address you used:
import socket
s = socket.socket()
s.connect(("rile5.com", 80))
header = b"""\
GET / HTTP/1.1\r
Host: rile5.com\r
Connection: keep-alive\r
Cache-Control: max-age=0\r
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\r
User-Agent: Mozilla/5.0 (Windows NT 6.3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36 OPR/26.0.1656.60\r
Accept-Encoding: gzip, deflate, lzma, sdch\r
Accept-Language: en-GB,en-US;q=0.8,en;q=0.6\r
\r
"""
s.sendall(header)
print(s.recv(10000))
Note the extra slash after the opening quotes. This suppresses the initial newline.
header = b"""\
Also note the extra blank line at the end. This is required so the server knows the header is complete.
Why not just use urllib.request?
Probably the problem is with the format of your request.
First, your HTTP request starts with a line feed. Also, the lines in a HTTP request must be separated by \r\n, while Python multiline strings only have \n. But since you have literals \r\n in some of them (not all) it is a mess.
Finally, the header must end with an empty line.
My advice is to use a list of strings without any line ending, and then join them:
header_lines = [
"GET / HTTP/1.1",
"Host: httn",
"Connection: keep-alive",
...
]
header = "\r\n".join(header_lines) + "\r\n\r\n"
Note that since str.join() does not add a final EOL, you have to add two of them to include the mandatory empty line.