Does urllib2 support threading to servers requiring basic auth? - python

I am developing an application which uses a series of REST calls to retrieve data. I have the basic application logic complete and the structure for data retrieval is roughly as follows.
1) the initial data call is completed
2) for each response in the initial call a subsequent data call is performed to a rest service requiring basic authentication.
Performing these calls in sequential order can add up to a long wait time by the end user, I am therefore trying to implement threading to speed up the process (being IO bound makes this an ideal candidate for threading). The problem is I am having problems with the authentication on the threaded calls.
If I perform the calls sequentially then everything works fine but if I set it up with the threaded approach I end up with 401 authentication errors or 500 internal server errors from the server.
I have talked to the REST service admins and they know of nothing that would prevent concurrent connections from the same user on the server end so I am wondering if this is an issue on the urllib2 end.
Does anyone have any experience with this?
EDIT:
While I am unable to post the exact code I will post a reasonable representation of what I am doing with very similar structure.
import threading
class UrlThread(threading.Thread):
def __init__(self, data):
threading.Thread.__init__(self)
self.data = data
def run(self):
password_manager = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_manager.add_password(None, 'https://url/to/Rest_Svc/', 'uid', 'passwd')
auth_manager = urllib2.HTTPBasicAuthHandler(password_manager)
opener = urllib2.build_opener(auth_manager)
urllib2.install_opener(opener)
option = data[0]
urlToOpen = 'https://url/to/Rest_Svc/?option='+option
rawData = urllib2.urlopen(urlToOpen)
wsData = rawData.readlines()
if wsData:
print('success')
#firstCallRows is a list of lists containing the data returned
#from the initial call I mentioned earlier.
thread_list = []
for row in firstCallRows:
t = UrlThread(row)
t.setDaemon(True)
t.start()
thread_list.append(t)
for thread in thread_list:
thread.join()

With Requests you could do something like this:
from requests import session, async
auth = ('username', 'password')
url = 'http://example.com/api/'
options = ['foo1', 'foo2', 'foo3']
s = session(auth=auth)
rs = [async.get(url, params={'option': opt}, session=s) for opt in options]
responses = async.imap(rs)
for r in responses:
print r.text
Relevant documentation:
Sessions
Asynchronous requests
Basic authentication

Related

Batching and queueing in a real-time webserver

I need a webserver which routes the incoming requests to back-end workers by batching them every 0.5 second or when it has 50 http requests whichever happens earlier. What will be a good way to implement it in python/tornado or any other language?
What I am thinking is to publish the incoming requests to a rabbitMQ queue and then somehow batch them together before sending to the back-end servers. What I can't figure out is how to pick multiple requests from the rabbitMq queue. Could someone point me to right direction or suggest some alternate apporach?
I would suggest using a simple python micro web framework such as bottle. Then you would send the requests to a background process via a queue (thus allowing the connection to end).
The background process would then have a continuous loop that would check your conditions (time and number), and do the job once the condition is met.
Edit:
Here is an example webserver that batches the items before sending them to any queuing system you want to use (RabbitMQ always seemed overcomplicated to me with Python. I have used Celery and other simpler queuing systems before). That way the backend simply grabs a single 'item' from the queue, that will contain all required 50 requests.
import bottle
import threading
import Queue
app = bottle.Bottle()
app.queue = Queue.Queue()
def send_to_rabbitMQ(items):
"""Custom code to send to rabbitMQ system"""
print("50 items gathered, sending to rabbitMQ")
def batcher(queue):
"""Background thread that gathers incoming requests"""
while True:
batcher_loop(queue)
def batcher_loop(queue):
"""Loop that will run until it gathers 50 items,
then will call then function 'send_to_rabbitMQ'"""
count = 0
items = []
while count < 50:
try:
next_item = queue.get(timeout=.5)
except Queue.Empty:
pass
else:
items.append(next_item)
count += 1
send_to_rabbitMQ(items)
#app.route("/add_request", method=["PUT", "POST"])
def add_request():
"""Simple bottle request that grabs JSON and puts it in the queue"""
request = bottle.request.json['request']
app.queue.put(request)
if __name__ == '__main__':
t = threading.Thread(target=batcher, args=(app.queue, ))
t.daemon = True # Make sure the background thread quits when the program ends
t.start()
bottle.run(app)
Code used to test it:
import requests
import json
for i in range(101):
req = requests.post("http://localhost:8080/add_request",
data=json.dumps({"request": 1}),
headers={"Content-type": "application/json"})

Gevent async server with blocking requests

I have what I would think is a pretty common use case for Gevent. I need a UDP server that listens for requests, and based on the request submits a POST to an external web service. The external web service essentially only allows one request at a time.
I would like to have an asynchronous UDP server so that data can be immediately retrieved and stored so that I don't miss any requests (this part is easy with the DatagramServer gevent provides). Then I need some way to send requests to the external web service serially, but in such a way that it doesn't ruin the async of the UDP server.
I first tried monkey patching everything and what I ended up with was a quick solution, but one in which my requests to the external web service were not rate limited in any way and which resulted in errors.
It seems like what I need is a single non-blocking worker to send requests to the external web service in serial while the UDP server adds tasks to the queue from which the non-blocking worker is working.
What I need is information on running a gevent server with additional greenlets for other tasks (especially with a queue). I've been using the serve_forever function of the DatagramServer and think that I'll need to use the start method instead, but haven't found much information on how it would fit together.
Thanks,
EDIT
The answer worked very well. I've adapted the UDP server example code with the answer from #mguijarr to produce a working example for my use case:
from __future__ import print_function
from gevent.server import DatagramServer
import gevent.queue
import gevent.monkey
import urllib
gevent.monkey.patch_all()
n = 0
def process_request(q):
while True:
request = q.get()
print(request)
print(urllib.urlopen('https://test.com').read())
class EchoServer(DatagramServer):
__q = gevent.queue.Queue()
__request_processing_greenlet = gevent.spawn(process_request, __q)
def handle(self, data, address):
print('%s: got %r' % (address[0], data))
global n
n += 1
print(n)
self.__q.put(n)
self.socket.sendto('Received %s bytes' % len(data), address)
if __name__ == '__main__':
print('Receiving datagrams on :9000')
EchoServer(':9000').serve_forever()
Here is how I would do it:
Write a function taking a "queue" object as argument; this function will continuously process items from the queue. Each item is supposed to be a request for the web service.
This function could be a module-level function, not part of your DatagramServer instance:
def process_requests(q):
while True:
request = q.get()
# do your magic with 'request'
...
in your DatagramServer, make the function running within a greenlet (like a background task):
self.__q = gevent.queue.Queue()
self.__request_processing_greenlet = gevent.spawn(process_requests, self.__q)
when you receive the UDP request in your DatagramServer instance, you push the request to the queue
self.__q.put(request)
This should do what you want. You still call 'serve_forever' on DatagramServer, no problem.

Return HTTP response and continue processing - python bottle API

I have an analysis service using Python's bottle that is invoked by posting a url to the raw data.
What is the simplest/cleanest/best way to immediately return a 200 response to the callee and continue processing a potentially long-running logic in the application?
Spawn a new thread/process? Use a queue? Async?
from bottle import post, HTTPResponse, request
#post('/postprocess')
def records():
data_url = request.json.get('data_url', False)
postback_url = request.json.get('postback_url', False)
if not data_url or not postback_url:
return HTTPResponse(status=400, body="Missing data paramater")
#Immediately return response to callee
return HTTPResponse(status=200, body="Complete")
#Continue processing long-running code
result = get_and_process_data(data_url)
#POST result to another endpoint
There is no simplest solution to this. For a production system I would first look into an existing system created for these kinds of situations to determine if that would be a good fit, and if not, only then develop something more suitable to my situation. To that end, I would recommend you take a look at Celery
I would suggest using some Queuing mechanism to queue the data that needs processing. Then you can implement a pool of workers that can work on processing the Q.
You can then take advantage of Q monitoring tools to iron out any performance issues and your worker pool can scale up as needed.
I've recently found a very straightforward solution to this.
from threading import Thread
#app.route('/')
def send_response():
uri = request.args
ua = request.headers['user-agent']
token = request.args.get("token")
my_thread = Thread(target=process_response, args=[uri,
ua,
token])
my_thread.start()
res = Response(status=200,
mimetype='application/json')
return res
def process_response(uri, ua, token):
print(uri)
print(ua)
print(token)
pass

How to test django API with asychronous requests

I am developing an API using Django-TastyPie.
What API do?
It checks that if two or more requests are there on the server if yes it swap the data of both the requests and return a json response after 7 second delay.
What i need to do is send multiple asynchronous requests to the server to test this API.
I am using Django-Unit Test along with Tasty-Pie to test this functionality.
Problem
Django develpment server is single threaded so it does not support asynchronous requests
Solution tried:
I have tried to solve this by using multiprocessing:
class MatchResourceTest(ResourceTestCase):
def setUp(self):
super(MatchResourceTest, self).setUp()
self.user=""
self.user_list = []
self.thread_list = []
# Create and get user
self.assertHttpCreated(self.api_client.post('/api/v2/user/', format='json', data={'username': '123456','device': 'abc'}))
self.user_list.append( User.objects.get(username='123456') )
# Create and get other_user
self.assertHttpCreated(self.api_client.post('/api/v2/user/', format='json', data={'username': '456789','device': 'xyz'}))
self.user_list.append( User.objects.get(username='456789') )
def get_credentials(self):
return self.create_apikey(username=self.user.username, api_key=self.user.api_key.key)
def get_url(self):
resp = urllib2.urlopen(self.list_url).read()
self.assertHttpOK(resp)
def test_get_list_json(self):
for user in self.user_list:
self.user = user
self.list_url = 'http://127.0.0.1:8000/api/v2/match/?name=hello'
t = multiprocessing.Process(target=self.get_url)
t.start()
self.thread_list.append( t )
for t in self.thread_list:
t.join()
print ContactCardShare.objects.all()
Please suggest me any solution to test this API by sending asychronous requests
or
any APP , Library or any this which allow django development server to handle multiple requests asychronously
As far as I know, django's development server is multi-threaded.
I'm not sure this test is formatted correctly though. The test setUp shouldn't include tests itself, it should be foolproof data insertion by creating entries. The post should have it's own test.
See the tastypie docs for an example test case.

Using python Requests library to consume from Twitter's user streams - how to detect disconnection?

I'm trying to use Requests to create a robust way of consuming from Twitter's user streams. So far, I've produced the following basic working example:
"""
Example of connecting to the Twitter user stream using Requests.
"""
import sys
import json
import requests
from oauth_hook import OAuthHook
def userstream(access_token, access_token_secret, consumer_key, consumer_secret):
oauth_hook = OAuthHook(access_token=access_token, access_token_secret=access_token_secret,
consumer_key=consumer_key, consumer_secret=consumer_secret,
header_auth=True)
hooks = dict(pre_request=oauth_hook)
config = dict(verbose=sys.stderr)
client = requests.session(hooks=hooks, config=config)
data = dict(delimited="length")
r = client.post("https://userstream.twitter.com/2/user.json", data=data, prefetch=False)
# TODO detect disconnection somehow
# https://github.com/kennethreitz/requests/pull/200/files#L13R169
# Use a timeout? http://pguides.net/python-tutorial/python-timeout-a-function/
for chunk in r.iter_lines(chunk_size=1):
if chunk and not chunk.isdigit():
yield json.loads(chunk)
if __name__ == "__main__":
import pprint
import settings
for obj in userstream(access_token=settings.ACCESS_TOKEN, access_token_secret=settings.ACCESS_TOKEN_SECRET, consumer_key=settings.CONSUMER_KEY, consumer_secret=settings.CONSUMER_SECRET):
pprint.pprint(obj)
However, I need to be able to handle disconnections gracefully. Currently, when the stream disconnects, the above just hangs, and there are no exceptions raised.
What would be the best way to achieve this? Is there a way to detect this through the urllib3 connection pool? Should I use a timeout?
I would recommend adding a timeout parameter to the client.post() call. http://docs.python-requests.org/en/latest/user/quickstart/#timeouts
However, it is important to note that requests doesn't set the TCP timeout, so you could set that using the following:
import socket
socket.setdefaulttimeout(TIMEOUT)

Categories