How to make correct TCP request using python - python

I am trying to make request but google.com returns status 400, but It should be 302. What's wrong with my request? Do i need additional request header? Any ideas?
Current code:
import socket
host = "www.google.com"
port = 80
client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client.connect((host,port))
client.send("GET / HTTP1.1\r\nHost: www.google.com\r\n\r\n")
response = client.recv(4096)
print response
Response:
HTTP/1.0 400 Bad Request
Content-Type: text/html; charset=UTF-8
Content-Length: 1504
Date: Mon, 07 Sep 2015 16:25:02 GMT
Server: GFE/2.0
<!DOCTYPE html>
<html lang=en>
<meta charset=utf-8>
<meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
<title>Error 400 (Bad Request)!!1</title>
<style>
*{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}#media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/logos/errorpage/error_logo-150x54.png) no-repeat;margin-left:-5px}#media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/logos/errorpage/error_logo-150x54-2x.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/logos/errorpage/error_logo-150x54-2x.png) 0}}#media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/logos/errorpage/error_logo-150x54-2x.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
</style>
<a href=//w
ww.google.com/><span id=logo aria-label=Google></span></a>
<p><b>400.</b> <ins>That’s an error.</ins>
<p>Your client has issued a malformed or illegal request. <ins>That’s all we know.</ins>

In the string you use in the send function, you have missed a slash, when specifying the HTTP protocol version.
client.send("GET / HTTP1.1\r\nHost: www.google.com\r\n\r\n")
should be:
client.send('GET / HTTP/1.1\r\nHost: www.google.com\r\n\r\n')

Related

bad request socket python

I'm using socket to build a simple "web browser" but I'm getting stuck at the start, whit a bad request result, here is my code:
import socket
mysocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
URI = 'data.pr4e.org'
mysocket.connect((URI, 80))
cmd = "GET http://{0}/romeo.txt HTTP/1.0\n\n".format(URI).encode()
mysocket.send(cmd) # send a request
while True:
data = mysocket.recv(512) # recieve 512 bites at time
# if there is no more information to recive, then, close the loop
if (len(data) < 1):
break
print(data.decode())
pass
mysocket.close() # close connection
here is the output
HTTP/1.1 400 Bad Request
Date: Mon, 15 Feb 2021 14:36:06 GMT
Server: Apache/2.4.18 (Ubuntu)
Content-Length: 308
Connection: close
Content-Type: text/html; charset=iso-8859-1
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>400 Bad Request</title>
</head><body>
<h1>Bad Request</
h1>
<p>Your browser sent a request that this server could not understand.<br />
</p>
<hr>
<address>Apache/2.4.18 (Ubuntu) Server at do1.dr-chuck.com Port 80</address>
what I'm doing wrong? also, I tryed replacing data.pr4e.org by facebook.com and youtube.com and I get this output:
HTTP/1.1 301 Moved Permanently
Vary: Accept-Encoding
Location: https://facebook.com/
Content-Type: text/html; charset="utf-8"
X-FB-Debug: LPmWQm0VVptVpi8QX8/SxymrJg9ZoL/mL+W+G4pZA4HGj5WI5YIG1s8sgqwp6TIleGvUg3U1eDNEhGoCsaJG5g==
Date: Mon, 15 Feb 2021 14:52:43 GMT
Alt-Svc: h3-29=":443"; ma=3600,h3-27=":443"; ma=3600
Connection: close
Content-Length: 0
thank you
Here the problem is just that you used \n when the server expected \r\n for end of line.
Anyway, as you directly connect to the HTTP host, you should not put the full URI in the request line. This would be better on a HTTP 1.0 conformance point:
cmd = "GET /romeo.txt HTTP/1.0\r\n\r\n".encode()
But if the server could accept more that one virtual server, you should pass the name in a Host header:
cmd = "GET /romeo.txt HTTP/1.0\r\nHost: {}\r\n\r\n".format(URI).encode()

Getting 400 Bad Request error with Python Socket GET request

for a school assignment I need to send GET requests and receive the data using only sockets. I keep getting an HTTP/1.1 400 Bad Request error, no matter how I try to format the GET request.
Here is my code(please excuse me if it's terrible, this is my first ever python project):
import socket
import sys
try:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
print ("Creation successful")
except socket.error as err:
print ("Creation unsuccessful. Error: %s" %(err))
port = 80
try:
ip = socket.gethostbyname('gaia.cs.umass.edu')
except socket.gaierror:
print("Error resolving host")
sys.exit()
s.connect((ip, port))
print("Connection successful")
print("Connected to %s" %(ip))
try:
s.sendall("GET wireshark-labs/HTTP-ethereal-lab-file3.html HTTP/1.1\r\nHost: gaia.cs.umass.edu\r\n\r\n".encode())
while True:
data = s.recv(1024)
print("Data received")
print(data.decode('UTF-8'))
if not data:
print("No Data")
s.close()
break
except socket.gaierror:
print("Error sending data")
sys.exit()
And this is the error I receive:
HTTP/1.1 400 Bad Request
Date: Mon, 23 Nov 2020 07:04:03 GMT
Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips PHP/7.4.12 mod_perl/2.0.11 Perl/v5.16.3
Content-Length: 226
Connection: close
Content-Type: text/html; charset=iso-8859-1
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>400 Bad Request</title>
</head><body>
<h1>Bad Request</h1>
<p>Your browser sent a request that this server could not understand.<br />
</p>
</body></html>
Trying to get this GET request to work is driving me insane, thanks in advance for any help you all send my way.
Sorry cannot write this as a comment as including the output is necessary. I was trying things with your code and when I changed the host and the wireshark-labs part ..a strange thing happened. I changed it to
s.sendall(b"GET / HTTP/1.1\r\nHost: www.cnn.com\r\n\r\n") to understand how this works...
and the response I get is...
Creation successful
Connection successful
Connected to 128.119.245.12
Data received
HTTP/1.1 200 OK
Date: Mon, 23 Nov 2020 07:48:51 GMT
Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips PHP/7.4.12 mod_perl/2.0.11 Perl/v5.16.3
Last-Modified: Tue, 01 Mar 2016 18:57:50 GMT
ETag: "a5b-52d015789ee9e"
Accept-Ranges: bytes
Content-Length: 2651
Content-Type: text/html; charset=UTF-8
<html>
<head>
<title>Computer Network Research Group - UMass Amherst
</title>
</head>
<body bgcolor="#ffffff">
<center>
<p><img
src="cnrg_imap.jpg"
border="0" usemap="#cnrg_imapMAP">
<map name="cnrg_imapMAP">
<area coords="290,177,407,205" shape="rect" href="/networks/resources/index.html">
<area coords="163,178,275,205" shape="rect" href="/networks/education/index.html">
<area coords="62,165,145,191" shape="rect" href="/search.html">
<area coords="6,63,157,90" shape="rect" href="/networks/collaborations.html">
<area coords="64,7,146,34" shape="rect" href="/networks/people.html">
<area coords="163,7,270,33" shape="rect" href="/networks/research.html">
<area coords="288,6,417,33" shape="rect"
href="/networks/
Data received
publications.html">
</map>
<P>
<BR>
<BR>
<P>
<table width=100% border=0 cellpadding=0 cellspacing=0>
<tr>
<td width=60> </td>
<td width="10%"> </td>
<td width="70%">
<font face="Helvetica,Arial,TimesNewRoman" color="#000000">
<h4>
The Computer Networks Research Group in the School of Information and Computer Sciences at
the University of Massachusetts, Amherst is led by Professors Jim Kurose , and Don Towsley.
Our research spans a broad range of topics in networking, including network protocols and architecture, modeling and analysis, sensor networks, wireless networks, and network measurement. We seek a principled understanding of new and emerging areas through a complementary mix of theoretical and applied experimental research.
</td>
<td width="10%"> </td>
</
Data received
tr>
</table>
</h4>
</font>
<p>
<!-- QUP Button & QualNet link markup start -->
<a href="http://www.scalable-networks.com/customers/qup/index.php">
<img src="http://www.scalable-networks.com/images/qupmember.gif" border=0
alt="QualNet Network Simulator University Program"></a>
QualNet Network Simulator
<!-- End of QUP Button HTML & QualNet link markup -->
<p>
<p>
<center>
<font size="1">
PEOPLE |
RESEARCH |
PUBLICATIONS |
COLLABORATIONS |
SEARCH |
EDUCATION |
<a href="/networks/resources/index.html">
RESOURCES</a>
<p>
</font>
</center>
</body>
</html>
Data received
No Data
Process finished with exit code 0
Which is what you are looking for. It would seem that the host value in the sendall command is overwritten somewhere. Don't know how much of a part wireshark-labs part has in this. Hope the information helps. Also when I paste your code in pycharm it gives a warning here s.connect((ip, port)) s could be undefined.
P.S:- I'm not an expert in socket programming. Please excuse if there is some mistake. Just trying to help.
This might help somebody, i had the same issue with the 400 Bad Request.
I looked at the answer of Abhishek Rai (Thank you!) above and i realized my http request had only the newline escapes \n and lacked the \r carriage return escape, when i changed all the \n for \r\n my issue was fixed.
This was my error reading "HTTP/1.1 400 Bad Request" after entering the codes at the bottom.
import socket
my_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
my_sock.connect(('data.pr4e.org', 80))
cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\n\n'.encode()
my_sock.send(cmd)
while True:
data = my_sock.recv(512)
if len(data) < 1:
break
print(data.decode())
my_sock.close()
Where \n\n was, I replaced it with \r\n\r\n; it worked after that.
At the bottom are the adjustment and results.
import socket
my_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
my_sock.connect(('data.pr4e.org', 80))
cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\r\n\r\n'.encode()
my_sock.send(cmd)
while True:
data = my_sock.recv(512)
if len(data) < 1:
break
print(data.decode())
my_sock.close()
HTTP/1.1 200 OK
Date: Thu, 08 Dec 2022 06:52:18 GMT
Server: Apache/2.4.18 (Ubuntu)
Last-Modified: Sat, 13 May 2017 11:22:22 GMT
ETag: "a7-54f6609245537"
Accept-Ranges: bytes
Content-Length: 167
Cache-Control: max-age=0, no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Connection: close
Content-Type: text/plain
But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already s
ick and pale with grief

Python HTTP always 301 using sockets

I write a simple program to get some information from a website using python.
but when I run the code below, it always returns the following 301 info. At the same time, my browser can visit the website easily.
Please tell me why this happens and how to improve my code to avoid the problem.
HTTP/1.1 301 Moved Permanently
Date: Tue, 28 Aug 2018 14:26:20 GMT
Server: Apache
Referrer-Policy: origin-when-cross-origin
Location: https://www.ncbi.nlm.nih.gov/
Content-Length: 237
Connection: close
Content-Type: text/html; charset=iso-8859-1
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved here.</p>
</body></html>
import socket
searcher = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
searcher.connect(("www.ncbi.nlm.nih.gov", 80))
cmd = "GET https://www.ncbi.nlm.nih.gov/ HTTP/1.0\r\n\r\n".encode()
searcher.send(cmd)
while True:
data = searcher.recv(512)
if len(data)<1: break
print(data.decode())
searcher.close()
You recieve a 301 because site is redirecting to https site.
I don't know if using sockets is mandatory, but if not you can use requests, it's a easy-to-use lib for doing http requests:
import requests
req = requests.get("http://www.ncbi.nlm.nih.gov")
html = req.text
With this, the 301 is performed anyway but it's transparent.
If you want to do it with sockets, you should add the "ssl layer" manually:
import socket
import ssl
searcher = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
searcher.connect(("www.ncbi.nlm.nih.gov", 443))
searcher = ssl.wrap_socket(searcher, keyfile=None, certfile=None, server_side=False, cert_reqs=ssl.CERT_NONE, ssl_version=ssl.PROTOCOL_SSLv23)
cmd = "GET https://www.ncbi.nlm.nih.gov/ HTTP/1.0\r\n\r\n".encode()
searcher.send(cmd)
while True:
data = searcher.recv(512)
if len(data) < 1: break
print(data.decode())
searcher.close()

Python socket HTTP 1.1 CONNECT request without a valid response

Well, I just want to make the following simple program that tries to create an https tunel with www.google.com at port 443. I first tried the following code:
import socket
def main():
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("www.google.com", 80))
request = "CONNECT www.google.com:443 HTTP/1.1\n\n"
s.send(request.encode())
print(s.recv(4096).decode())
main()
The result of that was the following:
HTTP/1.1 405 Method Not Allowed
Content-Type: text/html; charset=UTF-8
Referrer-Policy: no-referrer
Content-Length: 1592
Date: Wed, 16 Aug 2017 07:56:14 GMT
Connection: close
<!DOCTYPE html>
<html lang=en>
<meta charset=utf-8>
<meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
<title>Error 405 (Method Not Allowed)!!1</title>
<style>
*{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}#media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}#media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}#media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
</style>
<a href=//www.google.com/><span id=logo aria-label=Google></span></a>
<p><b>405.</b> <ins>That’s an error.</ins>
<p>The request method <code>CONNECT</code> is inappropriate for the URL <code>/</code>. <ins>That’s all we know.</ins>
That means that the server does not allow this request to be executed. So I thought that the problem was the port number. So I changed it to 443(which is the port for https connection). The code is that:
def main():
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("www.google.com", 443))
request = "CONNECT www.google.com:443 HTTP/1.1\n\n"
s.send(request.encode())
print(s.recv(4096).decode())
main()
But it does not print out a valid respnse as it should have done. It gives me an empty response.
The question to that is: "Why is that happening? How can I make it work properly?"
Note: I don't want to use built-in urllib or urllib2 libraries. I want to do that with sockets.
HTTP
In your original connection to port 80 you are just using wrong Host:
import socket
def main():
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('google.com', 80))
request = b'CONNECT google.com HTTP/1.1\n\n'
s.send(request)
print(s.recv(4096).decode())
main()
Response:
HTTP/1.0 200 Connection established
Or use GET method right away:
request = b'GET http://google.com HTTP/1.1\n\n'
Response is the same as to HTTPS request, google.com host doesn't work for some reason.
HTTPS
You should wrap your socket in ssl tunnel (not sure if correct term) in order to connect using HTTPS, and GET method is ready to use right after connection:
import socket
import ssl
def main():
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s = ssl.wrap_socket(s)
s.connect(('google.com', 443))
request = b'GET google.com HTTP/1.1\n\n'
s.send(request)
print(s.recv(4096).decode())
main()
Response:
HTTP/1.1 302 Found
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Referrer-Policy: no-referrer
Location: https://www.google.ru/?gfe_rd=cr&ei=WwCUWc66L6qB3APs7ZPABA
Content-Length: 259
Date: Wed, 16 Aug 2017 08:20:43 GMT
Alt-Svc: quic=":443"; ma=2592000; v="39,38,37,35"
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
here.
</BODY></HTML>

Python - getting source code with socket

I wanna send http get request and receive source code from webpage, this has to be done through sockets. I set buffer size to 4096, but my script download only small part from the page
import socket
sock = socket.socket ( socket.AF_INET, socket.SOCK_STREAM )
sock.connect ( ( "edition.cnn.com", 80 ) )
host = socket.gethostbyname("edition.cnn.com")
sock.sendall('GET http://edition.cnn.com/index.html HTTP/1.1\r\n'\
+ 'User-Agent: agent123\r\n'\
+ 'Host: '+host+'\r\n'\
+ '\r\n')
print sock.recv(4096)
sock.close()
After I run this code data I get are
HTTP/1.1 200 OK
Server: nginx
Date: Wed, 01 Jan 2014 18:31:25 GMT
Content-Type: text/html
Transfer-Encoding: chunked
Connection: keep-alive
Set-Cookie: CG=GR:44:Réthimnon; path=/
Last-Modified: Wed, 01 Jan 2014 18:31:22 GMT
Vary: Accept-Encoding
Cache-Control: max-age=60, private
Expires: Wed, 01 Jan 2014 18:32:25 GMT
ac2a
<!DOCTYPE HTML>
<html lang="en-US">
<head>
<title>CNN.com International - Breaking, World, Business, Sports, Entertainment and Video News</title>
<meta http-equiv="X-UA-Compatible" content="IE=edge"/>
<meta http-equiv="content-type" content="text/html;charset=utf-8"/>
<meta http-equiv="last-modified" content="2014-01-01T18:28:34Z"/>
<meta http-equiv="refresh" content="1800;url=http://edition.cnn.com/?refresh=1"/>
<meta name="robots" content="index,follow"/>
<meta name="googlebot" content="noarchive"/>
<meta name="description" content="CNN.com International delivers breaking news from across the globe and information on the latest top stories, business, sports and entertainment headlines. Follow the news as it happens through: special reports, videos, audio, photo galleries plus interactive maps and timelines."/>
<meta name="keywords" content="CNN, CNN news, CNN International, CNN International news, CNN Edition, Edition news, news, news online, breaking news, U.S. news, world news, global news, weather, business, CNN Money, sports, politics, law, technology, entertainment, education,
Which isn't even first 13 rows from source code... view-source:http://edition.cnn.com/index.html
And another problem, when I try address google.com like a host
import socket
sock = socket.socket ( socket.AF_INET, socket.SOCK_STREAM )
sock.connect ( ( "google.com", 80 ) )
host = socket.gethostbyname("google.com")
sock.sendall('GET http://google.com/index.html HTTP/1.1\r\n'\
+ 'User-Agent: agent123\r\n'\
+ 'Host: '+host+'\r\n'\
+ '\r\n')
print sock.recv(4096)
sock.close()
I get this response
HTTP/1.1 301 Moved Permanently
Location: http://www.google.com/index.html
Content-Type: text/html; charset=UTF-8
Date: Wed, 01 Jan 2014 18:38:57 GMT
Expires: Fri, 31 Jan 2014 18:38:57 GMT
Cache-Control: public, max-age=2592000
Server: gws
Content-Length: 229
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Alternate-Protocol: 80:quic
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
here.
</BODY></HTML>
which says that page is moved to the same address like i wanted download...
sock.recv(4096) will read up to 4096 bytes; it depends on how much data has already arrived how much can actually be returned by the call. There is no guarantee that 4096 bytes will actually be available for reading in one go.
You'll have to continue to read from the socket until all data is received:
data = ''
chunk = sock.recv(4096)
while chunk:
data += chunk
if len(data) >= 4096:
break
chunk = sock.recv(4096)
Your request to http://google.com/index.html redirects to www.google.com, a different hostname. Adjust your request accordingly.
If you wanted to implement a full-on HTTP client, you'd have to parse the status line, process the 301 redirect response by parsing out the Location: header, and making a new connection to request the new URL given to you.
The edition.cnn.com uses HTTP/1.0 and www.google.com uses HTTP/1.1. Maybe someone can chime in on how to tell which one to use.
This works for: www.google.com
import socket
import time
domain = 'www.google.com'
# must specify index.html for google
full_url = 'http://www.google.com/index.html'
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect((domain, 80))
mysock.send('GET ' + full_url + ' HTTP/1.1\n\n')
while True:
data = mysock.recv(512)
time.sleep(2.0) # 2 second delay
if len(data) < 1:
break
print data
mysock.close()
This works for: edition.cnn.com
Warning: Large output; consider adjusting recv(512) to a larger number or changing time.sleep(2.0) to 1 second.
import socket
import time
domain = 'cnn.com'
full_url = 'http://edition.cnn.com/'
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect((domain, 80))
mysock.send('GET ' + full_url + ' HTTP/1.0\n\n')
while True:
data = mysock.recv(512)
time.sleep(2.0) # 2 second delay
if len(data) < 1:
break
print data
mysock.close()
Both processes finished with exit code 0

Categories