Valid HTTP request - python

I'm making a program in python to get html from an url using an http request. I tried this using a page on a testwebserver I made for this, and it worked with this request:
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("localhost", 8080))
s.send(("GET / HTTP/1.1\r\nHost: localhost:8080").encode("utf8"))
x = s.recv(1024)
while not x:
x = s.recv(1024)
print(x.decode("utf8"))
But when I try it on another site, it says bad request. How would I make this http valid for each site?
And how would I add get and post values in this?

The simplest way is
import requests
r = requests.get("https://www.stackoverflow.com")
print (r.text)

If you are trying avoid pip packages, you can still do http request nicely with standard library.
from urllib.request import urlopen
print(urlopen('http://localhost:8080').read())
I think it's possible with the way you are doing. You might need another header for the particular site, but I can't tell if you don't provide us the website. But implementing http client in python is like reinventing a wheel.

Related

Python and socket - connet to specific path

I need to connect/send msg to http://localhost:8001/path/to/my/service, but I am not able to find how to do that. I know how to send if I only have localhost and 8001, but I need this specific path /path/to/my/service. There is where my service is running.
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(<full-url-to-my-service>)
s.sendall(bytes('Message', 'utf-8'))
Update
My service is running on localhost:8001/api/v1/namespaces/my_namespace/services/my_service:http/proxy. How can I connect to it with python?
As #furas told in the comments
socket is primitive object and it doesn't have specialized method for this - and you have to on your own create message with correct data. You have to learn HTTP protocol and use it to send
This is a sample snippet to send a GET request in python using requests library
import requests
URL = 'http://localhost:8001/path/to/my/service'
response_text = requests.get(URL).text
print(response_text)
This assumes the Content-Type that GET URL produces is text. If it is json, then a minor change is required
import requests
URL = 'http://localhost:8001/path/to/my/service'
response_json = requests.get(URL).json()
print(response_json)
There are other ways to achieve the same using other good frameworks like urllib, and so on.
Here is the documentation of requests library for reference
sendall() requires bytes, so String must be encoded.
s.sendall("foobar".encode())

HTTP Get Request "Moved Permanently" using HttpLib

Scope:
I am currently trying to write a Web scraper for this specific page. I have a pretty strong "Web Crawling" background using C#, but this httplib is beating me off.
Problem:
When trying to make a Http Get request for the page specified above I get a "Moved Permanently", that points to the very same URL. I can make a request using the requests lib, but I want to make it work using httplib so I can understand what I am doing wrong.
Code Sample:
I am completely new to Python, so any wrong language guideline or syntax is C#'s fault.
import httplib
# Wrapper for a "HTTP GET" Request
class HttpClient(object):
def HttpGet(self, url, host):
connection = httplib.HTTPConnection(host)
connection.request('GET', url)
return connection.getresponse().read()
# Using "HttpClient" class
httpclient = httpClient()
# This is the full URL I need to make a get request for : https://420101.com/strain-database
httpResponseText = httpclient.HttpGet('www.420101.com','/strain-database')
print httpResponseText
I really want to make it work using the httplib library, instead of requests or any other fancy one because I feel like I am missing something really small here.
The problem i've had too little or too much caffeine in my system.
To get a https, I needed the HTTPSConnection class.
Also, there is no 'www' in the address I wanted to GET. So, it shouldn't be included in the host.
Both of the wrong addresses redirect me to the correct one, with the 301 error code. If I were using requests or a more full featured module, it would have automatically followed the redirect.
My Validation:
c = httplib.HTTPSConnection('420101.com')
c.request("GET", "/strain-database")
r = c.getresponse()
print r.status, r.reason
200 OK

WebService request with python suds by XML

i'm trying to comunicate with a webserver with python. I'm using the suds library. Actually i'm pretty new with this.
Usually, to comunicate with this WebServer a send a xml message and i get a response. So this is what i would like to do with python.
Here's the code i wrote:
from suds.client import Client
with open("PATH","r") as f:
file=f.read()
url='URL'
client = Client(url)
httpHeaders = {'Content-Type': 'text/xml', 'SOAPAction': 'ACTION'}
client.set_options(headers=httpHeaders)
Now i don't know how to make the request. I tried this:
print client.service.test(__inject={'msg': file})
But i got the error:
Exception: No services defined
The problem seems clear, but i don't know haw to procede. Any suggestion ?

Python httplib POST request and proper formatting

I'm currently working on a automated way to interface with a database website that has RESTful webservices installed. I am having issues with figure out the proper formatting of how to properly send the requests listed in the following site using python.
https://neesws.neeshub.org:9443/nees.html
Particular example is this:
POST https://neesws.neeshub.org:9443/REST/Project/731/Experiment/1706/Organization
<Organization id="167"/>
The biggest problem is that I do not know where to put the XML formatted part of the above. I want to send the above as a python HTTPS request and so far I've been trying something of the following structure.
>>>import httplib
>>>conn = httplib.HTTPSConnection("neesws.neeshub.org:9443")
>>>conn.request("POST", "/REST/Project/731/Experiment/1706/Organization")
>>>conn.send('<Organization id="167"/>')
But this appears to be completely wrong. I've never actually done python when it comes to webservices interfaces so my primary question is how exactly am I supposed to use httplib to send the POST Request, particularly the XML formatted part of it? Any help is appreciated.
You need to set some request headers before sending data. For example, content-type to 'text/xml'. Checkout the few examples,
Post-XML-Python-1
Which has this code as example:
import sys, httplib
HOST = www.example.com
API_URL = /your/api/url
def do_request(xml_location):
"""HTTP XML Post requeste"""
request = open(xml_location,"r").read()
webservice = httplib.HTTP(HOST)
webservice.putrequest("POST", API_URL)
webservice.putheader("Host", HOST)
webservice.putheader("User-Agent","Python post")
webservice.putheader("Content-type", "text/xml; charset=\"UTF-8\"")
webservice.putheader("Content-length", "%d" % len(request))
webservice.endheaders()
webservice.send(request)
statuscode, statusmessage, header = webservice.getreply()
result = webservice.getfile().read()
print statuscode, statusmessage, header
print result
do_request("myfile.xml")
Post-XML-Python-2
You may get some idea.

How do I get HTTP header info without authentication using python?

I'm trying to write a small program that will simply display the header information of a website. Here is the code:
import urllib2
url = 'http://some.ip.add.ress/'
request = urllib2.Request(url)
try:
html = urllib2.urlopen(request)
except urllib2.URLError, e:
print e.code
else:
print html.info()
If 'some.ip.add.ress' is google.com then the header information is returned without a problem. However if it's an ip address that requires basic authentication before access then it returns a 401. Is there a way to get header (or any other) information without authentication?
I've worked it out.
After try has failed due to unauthorized access the following modification will print the header information:
print e.info()
instead of:
print e.code()
Thanks for looking :)
If you want just the headers, instead of using urllib2, you should go lower level and use httplib
import httplib
conn = httplib.HTTPConnection(host)
conn.request("HEAD", path)
print conn.getresponse().getheaders()
If all you want are HTTP headers then you should make HEAD not GET request. You can see how to do this by reading Python - HEAD request with urllib2.

Categories