I am trying to programmability check the output of a tcp stream. I am able to get the results of the tcp stream by turning on debug in HTTPConnection but how do I read the data and evaluate it with say a regular expression. I keep getting "TypeError: expected string or buffer". Is there a way to convert the result to a string?
thanks!
SCRIPT:
from urllib2 import Request, urlopen, URLError, HTTPError
import urllib2
import cookielib
import httplib
import re
httplib.HTTPConnection.debuglevel = 1
p = re.compile('abc=..........')
cj = cookielib.CookieJar()
proxy_address = '192.168.232.134:8083' # change the IP:PORT, this one is for example
proxy_handler = urllib2.ProxyHandler({'http': proxy_address})
opener = urllib2.build_opener(proxy_handler, urllib2.HTTPCookieProcessor(cj), urllib2.HTTPHandler(debuglevel=1))
urllib2.install_opener(opener)
url = "http://www.google.com/" # change the url
req=urllib2.Request(url)
data=urllib2.urlopen(req)
m=p.match(data)
if m:
print "Match found."
else:
print "Match not found."
RESULTS:
send: 'GET hyperlink/ HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: www.google.com\r\nConnection: close\r\nUser-Agent: Python-urllib/2.6\r\n\r\n'
reply: 'HTTP/1.1 303 See Other\r\n'
header: Location: hyperlink:8083/3240951276
header: Set-Cookie: abc=3240951276; path=/; domain=.google.com; expires=Thu, 31-Dec-2020 23:59:59 GMT
header: Content-Length: 0
send: 'GET hyperlink/3240951276 HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: hyperlink\r\nConnection: close\r\nUser-Agent: Python-urllib/2.6\r\n\r\n'
reply: 'HTTP/1.1 303 See Other\r\n'
header: Location: hyperlink
header: Set-Cookie: abc=3240951276; path=/; expires=Thu, 31-Dec-2020 23:59:59 GMT
header: Content-Length: 0
send: 'GET http://www.google.com/ HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: www.google.com\r\nCookie: abc=3240951276\r\nConnection: close\r\nUser-Agent: Python-urllib/2.6\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Mon, 18 Oct 2010 21:09:32 GMT
header: Expires: -1
header: cache-control: max-age=0, private, private
header: Content-Type: text/html; charset=ISO-8859-1
header: Set-Cookie: PREF=ID=066bc785a2b15ef6:FF=0:TM=1287436172:LM=1287436172:S=mNiXaRhshpf8nLji; expires=Wed, 17-Oct-2012 21:09:32 GMT; path=/; domain=.google.com
header: Set-Cookie: NID=39=ur3gnXL80kEy4shKAh8_-XV8PhmS4G83slPcX9OD3L6uthQZw-wq7RUnB0WKGYR3F_QGoyZAyEPCvjdi69EXXq23dzvpuZSl_KU2o7pqcTB7Vym4co1LOXmi9YQGpbkb; expires=Tue, 19-Apr-2011 21:09:32 GMT; path=/; domain=.google.com; HttpOnly
header: Server: gws
header: X-XSS-Protection: 1; mode=block
header: Connection: close
header: Content-Length: 4676
header: X-Con-Reuse: 1
header: Content-Encoding: gzip
header: via: 1.1 HermesPrefetch (CID2627003316.AID3240951276.TID1)
header: X-Trace-Timing: Start=1287436172845, Sched=0, Dns=2, Con=11, RxS=28, RxD=35
Traceback (most recent call last):
File "C:\Documents and Settings\asdf\workspace\PythonScripts2\src\Test1.py", line 18, in <module>
m=p.match(data)
TypeError: expected string or buffer
The debug information httplib provides you there, which you see in your terminal, is not actually part of the object returned by urllib2.urlopen(). Instead, it's printed directly to your process's sys.stdout. There's no way to change this behaviour in httplib, unfortunately. It's not entirely clear to me what you're trying to achieve by "capturing" this output and running a regular expression over it, but if that's really what you want to do, you would need to replace sys.stdout with something else, such as a suitable StringIO object, and somehow seeing which output is the output you care about.
However, keep in mind that all the information that httplib produces in its debug output is available directly in your program. It's either based on stuff you pass to httplib (through urllib2) or it's part of the server's response, and thus available in the object returned by urllib2.urlopen(). For example, it looks like you're trying to extract the cookie information, which you can get at simply by getting the cookie from the CookieJar you're already providing. There doesn't seem to be any sensible reason to try and capture the output and parsing it.
Related
I get the below messages for every test step, which is bit annoying. I need to process the console logs in a different way.
send: b'PUT /api/v2/superadmin_personal/item/14278b98-4430-4d2e-8301-1e30501da3b3 HTTP/1.1\r\nHost: abc.lab.com:8080\r\nUser-Agent: python-requests/2.27.1\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\nAuthorization: Bearer 2c0717a7-b477-4e02-b1b5-df2a2757db70\r\nContent-Length: 137\r\nContent-Type: application/json\r\n\r\n'
send: b'{"endTime": "1646987482101", "status": "PASSED", "issue": null, "launchUuid": "f380b026-d7c9-4596-b80a-dcaec6fa82f2", "attributes": null}'
reply: 'HTTP/1.1 200 OK\r\n'
header: Cache-Control: no-cache, no-store, max-age=0, must-revalidate
header: Content-Type: application/json
header: Date: Fri, 11 Mar 2022 08:30:58 GMT
header: Expires: 0
header: Pragma: no-cache
header: X-Content-Type-Options: nosniff
header: X-Frame-Options: DENY
header: X-Xss-Protection: 1; mode=block
header: Content-Length: 93
You can set rp.http.logging=false in the reportportal.prop file or as a JVM parameter.
There is a common switch for all HTTP requests/responses Python sends:
from http.client import HTTPConnection
HTTPConnection.debuglevel = 0
Unfortunately Python uses just print to log HTTP (as here), ignoring his own logging framework. That's really silly, but here where Python is. Therefore there is no any straight way to configure what you want log and what you would like to skip. You can just turn on or off console printing for all HTTP requests.
By default httplib debug send, headers and reply information returns as logger.info,
Instead can how do i display send, headers and replay as part of Debug information?
import requests
import logging
import httplib
httplib.HTTPConnection.debuglevel = 1
logging.basicConfig() # you need to initialize logging, otherwise you will not see anything from requests
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True
requests.get('http://httpbin.org/headers')
It prints
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP Connection (1):
httpbin.org
send: 'GET /headers HTTP/1.1\r\nHost: httpbin.org\r\nConnection: keep-alive\r\nA
ccept-Encoding: gzip, deflate\r\nAccept: */*\r\nUser-Agent: python-requests/2.8.
1\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Server: nginx
header: Date: Mon, 14 Dec 2015 12:50:44 GMT
header: Content-Type: application/json
header: Content-Length: 156
header: Connection: keep-alive
header: Access-Control-Allow-Origin: *
header: Access-Control-Allow-Credentials: true
DEBUG:requests.packages.urllib3.connectionpool:"GET /headers HTTP/1.1" 200 156
<Response [200]>
Thanks #Eli
I could achieve using this post http://stefaanlippens.net/redirect_python_print
import logging
import sys
import requests
import httplib
# HTTP stream handler
class WritableObject:
def __init__(self):
self.content = []
def write(self, string):
self.content.append(string)
# A writable object
http_log = WritableObject()
# Redirection
sys.stdout = http_log
# Enable
httplib.HTTPConnection.debuglevel = 2
# get operation
requests.get('http://httpbin.org/headers')
# Remember to reset sys.stdout!
sys.stdout = sys.__stdout__
debug_info = ''.join(http_log.content).replace('\\r', '').decode('string_escape').replace('\'', '')
# Remove empty lines
debug_info = "\n".join([ll.rstrip() for ll in debug_info.splitlines() if ll.strip()])
It prints like
C:\Users\vkosuri\Dropbox\robot\lab>python New-Redirect_Stdout.py
send: GET /headers HTTP/1.1
Host: httpbin.org
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.8.1
reply: HTTP/1.1 200 OK
header: Server: nginx
header: Date: Tue, 15 Dec 2015 09:36:36 GMT
header: Content-Type: application/json
header: Content-Length: 156
header: Connection: keep-alive
header: Access-Control-Allow-Origin: *
header: Access-Control-Allow-Credentials: true
Thanks
Malli
some_logger.set_level() does not do what you think it does. It doesn't set the level of the logs being emitted by a logger. It sets the minimum level of log emitted by the logger that your handler will care about and acknowledge. To do what you're asking, I can only think of one real, reasonable way:
Capture the logs as they're coming in and re-log them. You can capture them with the idea described here, and use that in a subclass of requests. This would without a doubt be complicated. So, this is probably a good time to start asking yourself, "what am I really trying to achieve and is there another way to go about it?"
I have the exact same code on 2 servers. With one of them I can connect to amazon SQS, while the other one can't. Here is the output from the non-working server:
send: 'GET /?Action=GetQueueUrl&QueueName=Erablitek&Version=2012-11-05 HTTP/1.1\r\nAccept-Encoding: identity\r\nContent-Length: 0\r\nHost: queue.amazonaws.com\r\nAuthorization: AWS4-HMAC-SHA256 Credential=AKIAIOQSTDBQVPXWYK7A/20150219/us-east-1/sqs/aws4_request,SignedHeaders=host;x-amz-date,Signature=9f5b0a187b178974f7b9b28e0028c2f9c034ee6fa2b1ee3ea9fcf9c3370219d5\r\nX-Amz-Date: 20150219T155308Z\r\nUser-Agent: Boto/2.34.0 Python/2.7.3 Linux/3.12.31+\r\n\r\n'
reply: 'HTTP/1.1 403 Forbidden\r\n'
header: Server: Server
header: Date: Thu, 19 Feb 2015 10:53:13 GMT
header: Content-Type: text/xml
header: Content-Length: 367
header: Connection: keep-alive
header: x-amzn-RequestId: 717fcf52-963b-5c4b-8f22-820d54e28cb4
And here is the output from the working server
send: 'GET /?Action=GetQueueUrl&QueueName=Erablitek&Version=2012-11-05 HTTP/1.1\r\nAccept-Encoding: identity\r\nContent-Length: 0\r\nHost: queue.amazonaws.com\r\nAuthorization: AWS4-HMAC-SHA256 Credential=AKIAIOQSTDBQVPXWYK7A/20150219/us-east-1/sqs/aws4_request,SignedHeaders=host;x-amz-date,Signature=a9538654d3b281156cbb5a410717e80381cac1e19c9ffcd8d96589c25ed6256d\r\nX-Amz-Date: 20150219T110853Z\r\nUser-Agent: Boto/2.35.2 Python/2.7.3 Linux/3.12.31+\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Server: Server
header: Date: Thu, 19 Feb 2015 11:08:55 GMT
header: Content-Type: text/xml
header: Content-Length: 321
header: Connection: keep-alive
header: x-amzn-RequestId: f29ed0a6-d762-5079-b26c-9df911e5c178
To my knowledge, both servers are configured and maintained pretty much the same way, however they're installed in 2 different locations. I have checked and credentials are the same on both servers.
Edit: I have also tried several versions of boto, including 2.35.2 which is the one installed on the working server.
I have no idea what else I should be checking
The QueueName parameter appears to be different in the two requests. In the first one (the non-working one) the queue name is:
QueueName=ErabliTEK
and in the second, working example it is:
QueueName=Erablitek
Could that be your problem?
Say I have the following HTTP request:
GET /4 HTTP/1.1
Host: graph.facebook.com
And the server returns the following response:
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Cache-Control: private, no-cache, no-store, must-revalidate
Content-Type: text/javascript; charset=UTF-8
ETag: "539feb8aee5c3d20a2ebacd02db380b27243b255"
Expires: Sat, 01 Jan 2000 00:00:00 GMT
Pragma: no-cache
X-FB-Rev: 1070755
X-FB-Debug: pC4b0ONpdhLwBn6jcabovcZf44bkfKSEguNsVKuSI1I=
Date: Wed, 08 Jan 2014 01:22:36 GMT
Connection: keep-alive
Content-Length: 172
{"id":"4","name":"Mark Zuckerberg","first_name":"Mark","last_name":"Zuckerberg","link":"http:\/\/www.facebook.com\/zuck","username":"zuck","gender":"male","locale":"en_US"}
Since the Content-Lengh header depends on the length of the content, I cannot simply split by the Content-Length: 172 string. How can I extract the JSON and headers separately? They are both important to my program.
I am using this code to get the response:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("graph.facebook.com", 80))
s.send("GET /"+str(id)+"/picture HTTP/1.1\r\nHost: graph.facebook.com\r\n\r\n")
data = s.recv(1024)
s.close()
json_string = (somehow extract this)
userdata = json.loads(json_string)
The easy way to do this is to use an HTTP library. For example:
import json
import urllib2
r = urllib2.urlopen("http://graph.facebook.com/{}/picture".format(id))
json_string = r.read()
userdata = json.loads(json_string)
If you really want to parse it yourself, the HTTP protocol guarantees that headers and body are separated by an empty line, and that this will be the first empty line anywhere in the response, so it's not that hard:
data = s.recv(1024)
header, _, json_string = data.partition('\r\n\r\n')
userdata = json.loads(json_string)
There are some obvious down sides to this—as written, your code won't work if the response is longer than 1K, or if the kernel doesn't give you the whole response in a single recv (which it's never guaranteed to do), or if the server redirects you or gives you a 100 CONTINUE before the real response, or if the server decides to send back a chunked or MIME-multipart or other response instead of a flat body, or…
I'm trying to fetch data from http://book.libertorrent.com/, but at the moment I'm failing badly because some additional data (headers) present in response. My code is very simple:
response = urllib.urlopen('http://book.libertorrent.com/login.php')
f = open('someFile.html', 'w')
f.write(response.read())
read() returns:
Date: Fri, 09 Nov 2012 07:36:54 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: close
Cache-Control: no-cache, pre-check=0, post-check=0
Expires: 0
Pragma: no-cache
Set-Cookie: bb_test=973132321; path=/; domain=book.libertorrent.com
Content-Language: ru
1ec0
...Html...
0
And response.info() is empty.
is there any way to correct response?
Let's try this:
$ echo -ne "GET /index.php HTTP/1.1\r\nHost: book.libertorrent.com\r\n\r\n" | nc book.libertorrent.com 80 | head -n 10
HTTP/1.1 200 OK
WWW
Date: Sat, 10 Nov 2012 17:41:57 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Content-Language: ru
1f57
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"><html dir="ltr">
See that "WWW" in the second line? That's no valid HTTP header, I'm guessing that's what's throwing off the response parser here.
By the way, python2 and python3 behave differently here:
python2 seems to immediately interpret anything after this invalid header as content
python3 ignores all headers and continues reading the content after the double newline. Because the headers are ignored, so is the transfer encoding, and therfore the content lengths are interpreted as part of the body.
So in the end the problem is that the server is sending an invalid response, which should be fixed at the server's end.