Python3 ThreadingHTTPServer fails to send chunked encoded response - python

I'm implementing a simple reverse proxy in Python3 and I need to send a response with transfer-encoding chunked mode.
I've taken my cues from this post but I have some problems when sending the chunks in the format described here
If I send chunks of length <= 9 bytes the message is received correctly by the client, otherwise when sending chunks of length >= 10 bytes, it seems that some of them are not received and the message remains stuck in the client waiting indefinitely
Here is an example of non working code:
from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
class ProxyHTTPRequestHandler(BaseHTTPRequestHandler):
protocol_version = 'HTTP/1.1'
def do_GET(self, body=True):
# HTTP 200 + minimal HTTP headers in response
self.send_response(200)
self.send_header('transfer-encoding', 'chunked')
self.send_header('Content-Type', 'text/plain')
self.end_headers()
# writing 5 chunks of 10 characters
for i in range(5):
text = str(i+1) * 10 # concatenate 10 chars
chunk = '{0:d}\r\n'.format(len(text)) + text + '\r\n'
self.wfile.write(chunk.encode(encoding='utf-8'))
# writing close sequence
close_chunk = '0\r\n\r\n'
self.wfile.write(close_chunk.encode(encoding='utf-8'))
def main():
try:
server_address = ('127.0.0.1', 8099)
# I use ThreadingHTTPServer but the problem persists also with HTTPServer
httpd = ThreadingHTTPServer(server_address, ProxyHTTPRequestHandler)
print('http server is running')
httpd.serve_forever()
except KeyboardInterrupt:
print(" ^C entered, stopping web server...")
httpd.socket.close()
if __name__ == '__main__':
main()
In this case, after several seconds, and only if I manually stop python execution the result in Postman is the following. Please note the missing "2222222222" chunk
But if I use this length instead:
# writing the same 5 chunks of 9 characters
for i in range(5):
text = str(i+1) * 9 # concatenate 9 chars
chunk = '{0:d}\r\n'.format(len(text)) + text + '\r\n'
self.wfile.write(chunk.encode(encoding='utf-8'))
# writing close sequence
close_chunk = '0\r\n\r\n'
self.wfile.write(close_chunk.encode(encoding='utf-8'))
The communication ends correctly (after 6ms all the 5 chunks are interpreted correctly)
Some version informations:
HTTP Client: Postman 8.10
(venv) manuel#MBP ReverseProxy % python -V
Python 3.9.2
(venv) manuel#MBP ReverseProxy % pip freeze
certifi==2021.10.8
charset-normalizer==2.0.6
idna==3.2
requests==2.26.0
urllib3==1.26.7
Thanks in advance for any hints!

I post the solution (thanks to Martin Panter from bugs.python.org) in case anyone else will have the same problem in the future.
The behaviour was caused by the chunk size part, that must be in hex format,
not decimal.
Unfortunately from the Mozilla docs the format was not specified and the example used only length < 10. A formal definition is found here
In conclusion, the working version is the following (using {0:x} instead of {0:d})
# writing the same 5 chunks of 9 characters
for i in range(5):
text = str(i+1) * 9 # concatenate 9 chars
chunk = '{0:x}\r\n'.format(len(text)) + text + '\r\n'
self.wfile.write(chunk.encode(encoding='utf-8'))
# writing close sequence
close_chunk = '0\r\n\r\n'
self.wfile.write(close_chunk.encode(encoding='utf-8'))

Related

nfcpy retrieves the URL from a NFC tag. But how do I open the link?

What I want:
I want my Raspberry Pi to act as a NFC-reader that can trigger the URL record from a NFC tag.
My setup is a Raspberry Pi with a PN532 NFC HAT and nfcpy. I am using the example tagtool.py and right now I am able to scan the NFC-tag and then show the URL (+ some extra data)
But I want the system to run the URL which triggers a webhook on IFTTT (which then triggers a playlist on Spotify...)
What I have done so far:
I have used setup.py to install nfcpy and experimented a bit with the commands. But when I run the command
python3 tagtool.py --device tty:S0:pn532 -d nfc.ndef.UriRecord -l
It first returns this
[main] enable debug output for 'nfc.ndef.UriRecord'
[nfc.clf] searching for reader on path tty:S0:pn532
[nfc.clf] using PN532v1.6 at /dev/ttyS0
** waiting for a tag **
and then when I scan one of my NFC tags - which have a URL in URI Record - with the reader I get this message.
Type2Tag 'NXP NTAG213' ID=04EA530A3E4D80
NDEF Capabilities:
readable = yes
writeable = yes
capacity = 137 byte
message = 67 byte
NDEF Message:
record 1
type = 'urn:nfc:wkt:U'
name = ''
data = b'\x04maker.ifttt.com/trigger/Playlist_022/with/key/bVTin_XXEEEDDDDEEEEEE'
[main] *** RESTART ***
[nfc.clf] searching for reader on path tty:S0:pn532
[nfc.clf] using PN532v1.6 at /dev/ttyS0
** waiting for a tag **
As you can see the URL is right there under data (+b\x04 but without https:\ but I guess thats quite easy to change). So basically I just need to trigger it.
I read somewhere that I could use curlify so I have used the command 'pip3 install curlify' and made some changes to tagtool.py.
The original tagtool.py (which I believe is the most important part for what I am trying to do) looks like this
if tag.ndef:
print("NDEF Capabilities:")
print(" readable = %s" % ("no", "yes")[tag.ndef.is_readable])
print(" writeable = %s" % ("no", "yes")[tag.ndef.is_writeable])
print(" capacity = %d byte" % tag.ndef.capacity)
print(" message = %d byte" % tag.ndef.length)
if tag.ndef.length > 0:
print("NDEF Message:")
for i, record in enumerate(tag.ndef.records):
print("record", i + 1)
print(" type =", repr(record.type))
print(" name =", repr(record.name))
print(" data =", repr(record.data))
In the new tagtool2.py I have added this to the start of the document
import curlify
import requests
And then I have added this line
response = requests.get("https://repr(record.data)")
print(curlify.to_curl(response.request))
Which means it looks like this. And this is probably wrong in several ways:
if tag.ndef:
print("NDEF Capabilities:")
print(" readable = %s" % ("no", "yes")[tag.ndef.is_readable])
print(" writeable = %s" % ("no", "yes")[tag.ndef.is_writeable])
print(" capacity = %d byte" % tag.ndef.capacity)
print(" message = %d byte" % tag.ndef.length)
if tag.ndef.length > 0:
print("NDEF Message:")
for i, record in enumerate(tag.ndef.records):
print("record", i + 1)
print(" type =", repr(record.type))
print(" name =", repr(record.name))
print(" data =", repr(record.data))
response = requests.get("https://repr(record.data)")
print(curlify.to_curl(response.request))
Because when I try to trigger the URL with a NFC tag I get this message:
Type2Tag 'NXP NTAG213' ID=04EA530A3E4D80
NDEF Message:
record 1
type = 'urn:nfc:wkt:U'
name = ''
data = b'\x04maker.ifttt.com/trigger/Metal1/with/key/bVTin_XXEEEDDDDEEEEEE'
[urllib3.connectionpool] Starting new HTTPS connection (1): repr(record.data):443
[nfc.clf] HTTPSConnectionPool(host='repr(record.data)', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0xb579a650>: Failed to establish a new connection: [Errno -2] Name or service not known'))
Can anyone tell me what I am doing wrong? And if curlify is the right way to go?
What you are doing wrong is that the data stored in the NDEF message is encoded, you cannot just open a connection using the raw data you have to decode it first using the correct type. (as the encoded value has a Hex number in it)
It is also encoded in utf-8 so Python treats it as bytes not as a string type object.
So the type says it is a URI record type as you used 'nfc.ndef.UriRecord' (Don't know why call it urn instead)
So the Hex number \x04 means https://
Unfortunately I don't think anybody has written a decoder method for the NFC Uri specification only encoders.
Here is a link full spec for the NDEF URI record type
once you have replaced the Hex character in the data with the correct decoded value you will get the URL https://maker.ifttt.com/trigger/Metal1/with/key/bVTin_XXEEEDDDDEEEEEE
a simple example (where a stores the value instead of record.data)
import re
a = b'\x04maker.ifttt.com/trigger/Metal1/with/key/bVTin_XXEEEDDDDEEEEEE'
a_text = a.decode('utf-8')
x = re.sub('\x04', 'https://', a_text)
print(x)
requests.get(x)
Then you can use requests.get() on the decoded value
This turned out to be the simple but good solution for my needs.
if tag.ndef:
if tag.ndef.length > 0:
#print("NDEF Message:")
for i, record in enumerate(tag.ndef.records):
print(record.uri)
response = requests.get(record.uri)
print(curlify.to_curl(response.request))
I started out with a way more complicated solution. I am keeping it here if anybody runs into similar problems.
Since the first take, I have cleaned out some of the print() lines as well, and I know I can clean out some more lines, but I am keeping them here to make it easier to see whats happening.
Its especially worth noticing the y variable. I was left with an almost perfect URL, but I kept getting errors because of an extra ' at the end of the URL.
if tag.ndef:
if tag.ndef.length > 0:
for i, record in enumerate(tag.ndef.records):
print(repr(record.data))
print(str(record.data))
org_string = str(record.data)
mod_string = org_string[6:]
y = mod_string.rstrip(mod_string[-1])
w = "https://"
print(mod_string)
print(y)
print(w)
response = requests.get(w + y)
print(curlify.to_curl(response.request))
The code can be improved, but it works and it gives me this message and - more important - it triggers the URL on the NFC tag (I have scrambled the IFTTT webhook).
Type2Tag 'NXP NTAG215' ID=04722801E14103
b'\x04maker.ifttt.com/trigger/python_test/with/key/bEDoi_gUT5x5uDsdsaR3Ao'
b'\x04maker.ifttt.com/trigger/python_test/with/key/bEDoi_gUT5x5uDsdsaR3Aoo'
maker.ifttt.com/trigger/python_test/with/key/bEDoi_gUT5x5uDsdsaR3Ao'
maker.ifttt.com/trigger/python_test/with/key/bEDoi_gUT5x5uDsdsaR3Ao
https://
[urllib3.connectionpool] Starting new HTTPS connection (1): maker.ifttt.com:443
[urllib3.connectionpool] https://maker.ifttt.com:443 "GET maker.ifttt.com/trigger/python_test/with/key/bEDoi_gUT5x5uDsdsaR3Ao HTTP/1.1" 200 69
curl -X GET -H 'Accept: */*' -H 'Accept-Encoding: gzip, deflate' -H 'Connection: keep-alive' -H 'User-Agent: python-requests/2.21.0' https://maker.ifttt.com/trigger/python_test/with/key/bEDoi_gUT5x5uDsdsaR3Ao
[main] *** RESTART ***
[nfc.clf] searching for reader on path tty:S0:pn532
[nfc.clf] using PN532v1.6 at /dev/ttyS0
** waiting for a tag **

Why we need a carriage return \r before the new line character \n?

In the following code the HTTP protocol needs two newline character but what is the need of the \r there. Why can't we just add two \n and send the request?
import socket
mysock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
mysock.connect(("data.pr4e.org",80))
cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\r\n\r\n'.encode() # here
mysock.send(cmd)
while True:
data = mysock.recv(512)
if len(data) > 0:
print(data.decode())
else :
break
mysock.close()
Because that's how the HTTP protocol is defined. More specifically, HTTP 1.0 defines a request like this:
Request = Simple-Request | Full-Request
Full-Request = Request-Line
*( General-Header
| Request-Header
| Entity-Header )
CRLF
[ Entity-Body ]
Request-Line = Method SP Request-URI SP HTTP-Version CRLF
Full-Request, which is what should be used by any HTTP 1.0 compatible client (simple request is HTTP 0.9 and deprecated) needs to have two CRLF tokens (one is in Request-Line). A CRLF token is the two bytes \r\n. Hence the need to end the string in your example with \r\n\r\n.
This design choice was kept in HTTP 1.1.
Because that is how the HTTP protocol works.
The request/status line and headers must all end with <CR><LF> (that is, a carriage return followed by a line feed). The empty line must consist of only <CR><LF> and no other whitespace.
https://en.wikipedia.org/wiki/HTTP_message_body

Twisted Server receive data stream via POST, read request.content.read() byte by byte in a deferred for over an hour

I'm intending to receive a binary stream of data via a http POST call.
I believe the client side is working, that is, it writes chunks of bytes to the server, I can see the amount of data being sent with tcpdump, yet Twisted's request.content file-like object only starts producing output once the client disconnects.
This is what the server handler looks like:
def render(self, request):
if request.path == '/incoming-stream':
d = deferLater(reactor, 0, lambda: request)
d.addCallback(self.async_read)
return NOT_DONE_YET
def async_read(self, request):
sys.stdout.write('\nasync_read ' + str(request) + '\n')
sys.stdout.flush()
while True:
byte = request.content.read(1) # <--- read one byte
if len(byte) > 0:
sys.stdout.write(repr(byte))
sys.stdout.flush()
else:
break
sys.stdout.write('\nfinished ' + str(request) + '\n')
sys.stdout.flush()
request.write(b"finished")
request.finish()
If I can't do this with POST, I have no problem with switching over to WebSocket, but I'd first like to try to get this done via POST. The data posted are long running (one new POST request every hour, with it being alive and receiving data for an hour), relatively high bandwidth sensor data at approx 1kbps.
I am aware that there are better methods of transferring the data (WebSocket, MQTT, AMQP), but POST and WebSocket will give me least amount of trouble when receiving the data through an NGINX SSL endpoint. Currently NGINX is not being used (to discard any buffering it could be causing).
Twisted Web does not support streaming uploads in its IResource abstraction.
See https://twistedmatrix.com/trac/ticket/288

Using python sockets to receive large http requests

I am using python sockets to receive web style and soap requests. The code I have is
import socket
svrsocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
host = socket.gethostname();
svrsocket.bind((host,8091))
svrsocket.listen(1)
clientSocket, clientAddress = svrsocket.accept()
message = clientSocket.recv(4096)
Some of the soap requests I receive, however, are huge. 650k huge, and this could become several Mb. Instead of the single recv I tried
message = ''
while True:
data = clientSocket.recv(4096)
if len(data) == 0:
break;
message = message + data
but I never receive a 0 byte data chunk with firefox or safari, although the python socket how to says I should.
What can I do to get round this?
Unfortunately you can't solve this on the TCP level - HTTP defines its own connection management, see RFC 2616. This basically means you need to parse the stream (at least the headers) to figure out when a connection could be closed.
See related questions here - https://stackoverflow.com/search?q=http+connection
Hiya
Firstly I want to reinforce what the previous answer said
Unfortunately you can't solve this on the TCP level
Which is true, you can't. However you can implement an http parser on top of your tcp sockets. And that's what I want to explore here.
Let's get started
Problem and Desired Outcome
Right now we are struggling to find the end to a datastream. We expected our stream to end with a fixed ending but now we know that HTTP does not define any message suffix
And yet, we move forward.
There is one question we can now ask, "Can we ever know the length of the message in advance?" and the answer to that is YES! Sometimes...
You see HTTP/1.1 defines a header called Content-Length and as you'd expect it has exactly what we want, the content length; but there is something else in the shadows: Transfer-Encoding: chunked. unless you really want to learn about it, we'll stay away from it for now.
Solution
Here is a solution. You're not gonna know what some of these functions are at first, but if you stick with me, I'll explain. Alright... Take a deep breath.
Assuming conn is a socket connection to the desired HTTP server
...
rawheaders = recvheaders(conn,end=CRLF)
headers = dict_headers(io.StringIO(rawheaders))
l_content = headers['Content-Length']
#okay. we've got content length by magic
buffersize = 4096
while True:
if l_content <= 0: break
data = clientSocket.recv(buffersize)
message += data
l_content -= len(data)
...
As you can see, we enter the loop already knowing the Content-Length as l_content
While we iterate we keep track of the remaining content by subtracting the length of clientSocket.recv(buff) from l_content.
When we've read at least as much data as l_content, we are done
if l_content <= 0: break
Frustration
Note: For some these next bits I'm gonna give psuedo code because the code can be a bit dense
So now you're asking, what is rawheaders = recvheaders(conn), what is headers = dict_headers(io.StringIO(rawheaders)),
and HOW did we get headers['Content-Length']?!
For starters, recvheaders. The HTTP/1.1 spec doesn't define a message suffix, but it does define something useful: a suffix for the http headers! And that suffix is CRLF aka \r\n.That means we know when we've recieved the headers when we read CRLF. So we can write a function like
def recvheaders(sock):
rawheaders = ''
until we read crlf:
rawheaders = sock.recv()
return rawheaders
Next, parsing the headers.
def dict_header(ioheaders:io.StringIO):
"""
parses an http response into the status-line and headers
"""
#here I expect ioheaders to be io.StringIO
#the status line is always the first line
status = ioheaders.readline().strip()
headers = {}
for line in ioheaders:
item = line.strip()
if not item:
break
//headers look like this
//'Header-Name' : 'Value'
item = item.split(':', 1)
if len(item) == 2:
key, value = item
headers[key] = value
return status, headers
Here we read the status line then we continue to iterate over every remaining line
and build [key,value] pairs from Header: Value with
item = line.strip()
item = item.split(':', 1)
# We do split(':',1) to avoid cases like
# 'Header' : 'foo:bar' -> ['Header','foo','bar']
# when we want ---------> ['Header','foo:bar']
then we take that list and add it to the headers dict
#unpacking
#key = item[0], value = item[1]
key, value = item
header[key] = value
BAM, we've created a map of headers
From there headers['Content-Length'] falls right out.
So,
This structure will work as long as you can guarantee that you will always recieve Content-Length
If you've made it this far WOW, thanks for taking the time and I hope this helped you out!
TLDR; if you want to know the length of an http message with sockets, write an http parser

http checks python

learning python here, I want to check if anybody is running a web server on my local
network, using this code, but it gives me a lot of error in the concole.
#!/usr/bin/env python
import httplib
last = 1
while last <> 255:
url = "10.1.1." + "last"
connection = httplib.HTTPConnection("url", 80)
connection.request("GET","/")
response = connection.getresponse()
print (response.status)
last = last + 1
I do suggest changing the while loop to the more idiomatic for loop, and handling exceptions:
#!/usr/bin/env python
import httplib
import socket
for i in range(1, 256):
try:
url = "10.1.1.%d" % i
connection = httplib.HTTPConnection(url, 80)
connection.request("GET","/")
response = connection.getresponse()
print url + ":", response.status
except socket.error:
print url + ":", "error!"
To see how to add a timeout to this, so it doesn't take so long to check each server, see here.
as pointed out, you have some basic
quotation issues. but more fundamentally:
you're not using Pythonesque
constructs to handle things but
you're coding them as simple
imperative code. that's fine, of course, but below are examples of funner (and better) ways to express things
you need to explicitly set timeouts or it'll
take forever
you need to multithread or it'll take forever
you need to handle various common exception types or your code will crash: connections will fail (including
time out) under numerous conditions
against real web servers
10.1.1.* is only one possible set of "local" servers. RFC 1918 spells out that
the "local" ranges are 10.0.0.0 - 10.255.255.255, 172.16.0.0 - 172.31.255.255, and
192.168.0.0 - 192.168.255.255. the problem of
generic detection of responders in
your "local" network is a hard one
web servers (especially local
ones) often run on other ports than
80 (notably on 8000, 8001, or 8080)
the complexity of general
web servers, dns, etc is such that
you can get various timeout
behaviors at different times (and affected by recent operations)
below, some sample code to get you started, that pretty much addresses all of
the above problems except (5), which i'll assume is (well) beyond
the scope of the question.
btw i'm printing the size of the returned web page, since it's a simple
"signature" of what the page is. the sample IPs return various Yahoo
assets.
import urllib
import threading
import socket
def t_run(thread_list, chunks):
t_count = len(thread_list)
print "Running %s jobs in groups of %s threads" % (t_count, chunks)
for x in range(t_count / chunks + 1):
i = x * chunks
i_c = min(i + chunks, t_count)
c = len([t.start() for t in thread_list[i:i_c]])
print "Started %s threads for jobs %s...%s" % (c, i, i_c - 1)
c = len([t.join() for t in thread_list[i:i_c]])
print "Finished %s threads for job index %s" % (c, i)
def url_scan(ip_base, timeout=5):
socket.setdefaulttimeout(timeout)
def f(url):
# print "-- Trying (%s)" % url
try:
# the print will only complete if there's a server there
r = urllib.urlopen(url)
if r:
print "## (%s) got %s bytes" % (url, len(r.read()))
else:
print "## (%s) failed to connect" % url
except IOError, msg:
# these are just the common cases
if str(msg)=="[Errno socket error] timed out":
return
if str(msg)=="[Errno socket error] (10061, 'Connection refused')":
return
print "## (%s) got error '%s'" % (url, msg)
# you might want 8000 and 8001, too
return [threading.Thread(target=f,
args=("http://" + ip_base + str(x) + ":" + str(p),))
for x in range(255) for p in [80, 8080]]
# run them (increase chunk size depending on your memory)
# also, try different timeouts
t_run(url_scan("209.131.36."), 100)
t_run(url_scan("209.131.36.", 30), 100)
Remove the quotes from the variable names last and url. Python is interpreting them as strings rather than variables. Try this:
#!/usr/bin/env python
import httplib
last = 1
while last <> 255:
url = "10.1.1.%d" % last
connection = httplib.HTTPConnection(url, 80)
connection.request("GET","/")
response = connection.getresponse()
print (response.status)
last = last + 1
You're trying to connect to an url that is literally the string 'url': that's what the quotes you're using in
connection = httplib.HTTPConnection("url", 80)
mean. Once you remedy that (by removing those quotes) you'll be trying to connect to "10.1.1.last", given the quotes in the previous line. Set that line to
url = "10.1.1." + str(last)
and it could work!-)

Categories