Keeping connection alive on a urllib2.request() call

Keeping connection alive on a urllib2.request() call - python

Keeping the same format like this:
import urllib2
request = urllib2.Request('http://www.example.com', data)
response = urllib2.urlopen(request, timeout=4)
content = response.read()
Instead of using timeout=4, how can I use it with keep connection alive for as long as it takes?
Thanks in advance.

You can specify a very long timeout:
response = urllib2.urlopen(request, timeout=9999)
In addition you should look at requests, a much nicer lib than urllib2:
requests.get('http://www.example.com')
This by default hangs until the connection is closed.

Related

How to continuously pull data from a URL in Python?

I have a link, e.g. www.someurl.com/api/getdata?password=..., and when I open it in a web browser it sends a constantly updating document of text. I'd like to make an identical connection in Python, and dump this data to a file live as it's received. I've tried using requests.Session(), but since the stream of data never ends (and dropping it would lose data), the get request also never ends.
import requests
s = requests.Session()
x = s.get("www.someurl.com/api/getdata?password=...") #never terminates
What's the proper way to do this?

I found the answer I was looking for here: Python Requests Stream Data from API
Full implementation:
import requests
url = "www.someurl.com/api/getdata?password=..."
s = requests.Session()
with open('file.txt','a') as fp:
with s.get(url,stream=True) as resp:
for line in resp.iter_lines(chunk_size=1):
fp.write(str(line))
Note that chunk_size=1 is necessary for the data to immediately respond to new complete messages, rather than waiting for an internal buffer to fill before iterating over all the lines. I believe chunk_size=None is meant to do this, but it doesn't work for me.

You can keep making get requests to the url
import requests
import time
url = "www.someurl.com/api/getdata?password=..."
sess = requests.session()
while True:
req = sess.get(url)
time.sleep(10)

this will terminate the request after 1 second ,
import multiprocessing
import time
import requests
data = None
def get_from_url(x):
s = requests.Session()
data = s.get("www.someurl.com/api/getdata?password=...")
if __name__ == '__main__':
while True:
p = multiprocessing.Process(target=get_from_url, name="get_from_url", args=(1,))
p.start()
# Wait 1 second for get request
time.sleep(1)
p.terminate()
p.join()
# do something with the data
print(data) # or smth else

Python and socket - connet to specific path

I need to connect/send msg to http://localhost:8001/path/to/my/service, but I am not able to find how to do that. I know how to send if I only have localhost and 8001, but I need this specific path /path/to/my/service. There is where my service is running.
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(<full-url-to-my-service>)
s.sendall(bytes('Message', 'utf-8'))
Update
My service is running on localhost:8001/api/v1/namespaces/my_namespace/services/my_service:http/proxy. How can I connect to it with python?

As #furas told in the comments
socket is primitive object and it doesn't have specialized method for this - and you have to on your own create message with correct data. You have to learn HTTP protocol and use it to send
This is a sample snippet to send a GET request in python using requests library
import requests
URL = 'http://localhost:8001/path/to/my/service'
response_text = requests.get(URL).text
print(response_text)
This assumes the Content-Type that GET URL produces is text. If it is json, then a minor change is required
import requests
URL = 'http://localhost:8001/path/to/my/service'
response_json = requests.get(URL).json()
print(response_json)
There are other ways to achieve the same using other good frameworks like urllib, and so on.
Here is the documentation of requests library for reference

sendall() requires bytes, so String must be encoded.
s.sendall("foobar".encode())

Best way to constantly request http data?

Which is the best way to request constant data from a server in Python? I've tried with Urllib3 but for some reason after a while the python script stops. And I am also trying urllib2 (see below the code), but I notice there's a huge delay sometimes (that did not happen as frequently with urllib3) and the response is not every 0.5 seconds (sometimes it's every 6 seconds). What can I do to solve this?
import socket
import urllib2
import time
# timeout in seconds
timeout = 10
socket.setdefaulttimeout(timeout)
while True:
try:
# this call to urllib2.urlopen now uses the default timeout
# we have set in the socket module
req = urllib2.Request('https://www.okcoin.com/api/v1/future_ticker.do?symbol=btc_usd&contract_type=this_week')
response = urllib2.urlopen(req)
r = response.read()
req2 = urllib2.Request('http://market.bitvc.com/futures/ticker_btc_week.js')
response2 = urllib2.urlopen(req2)
r2 = response2.read()
except:
continue
print r + str(time.time())
print r2 + str(time.time())
time.sleep(0.5)

I think I found the problem. I needed to keep an open http session. That way I get the data more continuously. What's the best way of doing this? I did "http = requests.Session()" and using requests now.

Reading HTTP server push streams with Python

I'm playing around trying to write a client for a site which provides data as an HTTP stream (aka HTTP server push). However, urllib2.urlopen() grabs the stream in its current state and then closes the connection. I tried skipping urllib2 and using httplib directly, but this seems to have the same behaviour.
The request is a POST request with a set of five parameters. There are no cookies or authentication required, however.
Is there a way to get the stream to stay open, so it can be checked each program loop for new contents, rather than waiting for the whole thing to be redownloaded every few seconds, introducing lag?

You could try the requests lib.
import requests
r = requests.get('http://httpbin.org/stream/20', stream=True)
for line in r.iter_lines():
# filter out keep-alive new lines
if line:
print line
You also could add parameters:
import requests
settings = { 'interval': '1000', 'count':'50' }
url = 'http://agent.mtconnect.org/sample'
r = requests.get(url, params=settings, stream=True)
for line in r.iter_lines():
if line:
print line

Do you need to actually parse the response headers, or are you mainly interested in the content? And is your HTTP request complex, making you set cookies and other headers, or will a very simple request suffice?
If you only care about the body of the HTTP response and don't have a very fancy request, you should consider simply using a socket connection:
import socket
SERVER_ADDR = ("example.com", 80)
sock = socket.create_connection(SERVER_ADDR)
f = sock.makefile("r+", bufsize=0)
f.write("GET / HTTP/1.0\r\n"
+ "Host: example.com\r\n" # you can put other headers here too
+ "\r\n")
# skip headers
while f.readline() != "\r\n":
pass
# keep reading forever
while True:
line = f.readline() # blocks until more data is available
if not line:
break # we ran out of data!
print line
sock.close()

One way to do it using urllib2 is (assuming this site also requires Basic Auth):
import urllib2
p_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
url = 'http://streamingsite.com'
p_mgr.add_password(None, url, 'login', 'password')
auth = urllib2.HTTPBasicAuthHandler(p_mgr)
opener = urllib2.build_opener(auth)
urllib2.install_opener(opener)
f = opener.open('http://streamingsite.com')
while True:
data = f.readline()

Getting TTFB (time till first byte) for an HTTP Request

Here is a python script that loads a url and captures response time:
import urllib2
import time
opener = urllib2.build_opener()
request = urllib2.Request('http://example.com')
start = time.time()
resp = opener.open(request)
resp.read()
ttlb = time.time() - start
Since my timer is wrapped around the whole request/response (including read()), this will give me the TTLB (time to last byte).
I would also like to get the TTFB (time to first byte), but am not sure where to start/stop my timing. Is urllib2 granular enough for me to add TTFB timers? If so, where would they go?

you should use pycurl, not urllib2
install pyCurl:
you can use pip / easy_install, or install it from source.
easy_install pyCurl
maybe you should be a superuser.
usage:
import pycurl
import sys
import json
WEB_SITES = sys.argv[1]
def main():
c = pycurl.Curl()
c.setopt(pycurl.URL, WEB_SITES) #set url
c.setopt(pycurl.FOLLOWLOCATION, 1)
content = c.perform() #execute
dns_time = c.getinfo(pycurl.NAMELOOKUP_TIME) #DNS time
conn_time = c.getinfo(pycurl.CONNECT_TIME) #TCP/IP 3-way handshaking time
starttransfer_time = c.getinfo(pycurl.STARTTRANSFER_TIME) #time-to-first-byte time
total_time = c.getinfo(pycurl.TOTAL_TIME) #last requst time
c.close()
data = json.dumps({'dns_time':dns_time,
'conn_time':conn_time,
'starttransfer_time':starttransfer_time,
'total_time':total_time})
return data
if __name__ == "__main__":
print main()

Using your current open / read pair there's only one other timing point possible - between the two.
The open() call should be responsible for actually sending the HTTP request, and should (AFAIK) return as soon as that has been sent, ready for your application to actually read the response via read().
Technically it's probably the case that a long server response would make your application block on the call to read(), in which case this isn't TTFB.
However if the amount of data is small then there won't be much difference between TTFB and TTLB anyway. For a large amount of data, just measure how long it takes for read() to return the first smallest possible chunk.

By default, the implementation of HTTP opening in urllib2 has no callbacks when read is performed. The OOTB opener for the HTTP protocol is urllib2.HTTPHandler, which uses httplib.HTTPResponse to do the actual reading via a socket.
In theory, you could write your own subclasses of HTTPResponse and HTTPHandler, and install it as the default opener into urllib2 using install_opener. This would be non-trivial, but not excruciatingly so if you basically copy and paste the current HTTPResponse implementation from the standard library and tweak the begin() method in there to perform some processing or callback when reading from the socket begins.

To get a good proximity you have to do read(1). And messure the time.
It works pretty well for me.
The ony thing you should keep in mind: python might load more than one byte on the call of read(1). Depending on it's internal buffers. But i think the most tools will behave alike inaccurate.
import urllib2
import time
opener = urllib2.build_opener()
request = urllib2.Request('http://example.com')
start = time.time()
resp = opener.open(request)
# read one byte
resp.read(1)
ttfb = time.time() - start
# read the rest
resp.read()
ttlb = time.time() - start

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Keeping connection alive on a urllib2.request() call - python

Keeping the same format like this: import urllib2 request = urllib2.Request('http://www.example.com', data) response = urllib2.urlopen(request, timeout=4) content = response.read() Instead of using timeout=4, how can I use it with keep connection alive for as long as it takes? Thanks in advance.

You can specify a very long timeout: response = urllib2.urlopen(request, timeout=9999) In addition you should look at requests, a much nicer lib than urllib2: requests.get('http://www.example.com') This by default hangs until the connection is closed.

Related

How to continuously pull data from a URL in Python?

Python and socket - connet to specific path

Best way to constantly request http data?

Reading HTTP server push streams with Python

Getting TTFB (time till first byte) for an HTTP Request

Categories

Resources