Python library for HTTP support - including Content-Encoding - python

I have a scraper, which queries different websites. Some of them varyingly use Content-Encoding. And since I'm trying to simulate an AJAX query and need to mimic Mozilla, I need full support. There are multiple HTTP libraries for Python, but neither seems complete:
httplib seems pretty low level, more like a HTTP packet sniffer really.
urllib2 is some sort of elaborate hoax. There are a dozen handlers for various web client functions, but mandatory HTTP features like Content-Encoding appearantly aren't.
mechanize: is nice, already somehwat overkill for my tasks, but only supports CE 'gzip'.
httplib2: sounded most promising, but actually fails on 'deflate' encoding, because of the disparity of raw deflate and zlib streams.
So are there any other options? I can't believe I'm expected to reimplement workarounds for above libraries. And it's not a good idea to distribute patched versions alongside my application, because packagers might remove it again if the according library is available as separate distribution package.
I almost don't dare to say, but the http functions API in PHP is much nicer. And besides Content-Encoding:*, I might somewhen need multipart/form-data too. So, is there a comprehensive 3rd party library for http retrieval?

I would consider either invoking a child process of cURL or using python bindings for libcurl.
From this description cURL seems to support gzip and deflate.

Beautiful Soup might work. Just throwing it out there.

Related

Zeep vs Requests for SOAP APIs in Python

So I know that Python's requests module can be used to handle REST APIs, and as per this answer here, requests can also handle SOAP APIs as well.
I've worked only with REST APIs so far, but of the few people I know who work with SOAP APIs, they almost always use another module zeep.
Is there anything I'm missing?
Why is there a need for a whole seperate module, when it is possible using requests as well - and more importantly, why doesn't anybody just uses requests instead of using zeep?
Ease of use and reusability. While you could use the requests module and write raw XML, Zeep is a high level interface which generates the XML automatically

Publishing to crossbar.io from python daemon

I would like to use crossbar.io to display real-time stats on the web about a long-running python daemon. The displaying part works fine using AutobahnJS, but I struggle with the part that posts stats to crossbar.io. All the example code I read runs on twisted or asyncio, and my daemon doesn't (and won't). For pure WebSockets, there's the websocket_client package which does exactly what I would like to do, just not on WAMP. Is there a similar library, or am I missing something in the docs?
I'm using crossbar.io over pure WebSockets because I like the PubSub abstraction. I know I can re-implement it in WebSockets without a lot of additional work, but that's something I'd like to avoid.
I finally found a similar question, the solution is to use crossbar's HTTP Publisher service. There's also the crossbarconnect package which conveniently wraps all the necessary HTTP action. Sweet and short :-)

Monitor outgoing internet request in Python

What are some useful methods or libraries that can be used to track IP request from a personal computer. Ideally I would like the option to block or pause a specific outgoing request before/after some checks are carried out. I've seen Twisted, but I'm not sure if its exactly what I'm looking for just yet, or if there exist simpler methods for doing this. I'm not looking for a standalone application as there are other features that will be build around this for a specific purpose.
Language: Preferably in Python, but C/C++ are possible options as well.
OS: The current target is Linux (ubuntu). However cross-platform options would be best.
Twisted will make it easy to get up and running right away while making it possible for you to intercept, delay, or block requests: http://twistedmatrix.com/documents/current/api/twisted.web.proxy.Proxy.html

JSON-RPC server via Python

I need to implement a JSON-RPC server like this:
http://pasha.cdemo.applicationcraft.com/service/json
This server will be accessed from jQuery and I have to use Python for writing it.
What library should I use? Can you also give me an example of using that library?
Thanks.
I found cherrypy very easy to use (doesn't come with a predefined template engine or a database model, so it's IMO better than others when your server is producing json and is not a typical database).
Coupled with nginx and memcached can also be quite performant...
Python 2.6 comes with json module in the standard library which allows you to effective convert Python data structures to JSON responses.
For HTTP communications and request handling, you can use Python web frameworks like Pyramid, Django or HTTP server software like Tornado. It really much depends what do you need to process in your JSON-RPC calls.

twisted http client

I am after an example describing the usage of Twisted's HTTP Client.
After reading the excellent blog post on the internals of Twisted, I understand how the "Factory" and "Protocol" components play their role but I am unclear on how to introduce "Request" in the overall Client flow.
More specifically, I need to be able to perform HTTP GET and POST requests to a remote server using Twisted.
Updated: after a discussion on irc #twisted / #python, it seems that twisted.web2 is fading away in favor of beefing up functionality on twisted.web e.g. Agent.
As of Twisted 9.0, there are actually two HTTP clients available. The older one has quite a few features, such as automatically following redirects, interpreting cookie headers, etc. You can find an example of its usage here:
http://twistedmatrix.com/documents/current/web/examples/
(getpage.py and dlpage.py)
Unfortunately, the interface presented by the older client makes a number of common tasks difficult. For example, using getPage, you cannot examine arbitrary response headers.
The newer HTTP client isn't yet as featureful as the old one, but it presents an interface intended to eliminate the limitations of getPage. It is also intended to be more easily extended and customized. You can find a document describing its usage here:
http://twistedmatrix.com/documents/current/web/howto/client.html
I started using treq with twisted. treq has an API which is very similar to Requests.
https://pypi.python.org/pypi/treq/0.2.0
As of Twisted 10, you may want to use the Agent class.
Please follow this link:
http://twistedmatrix.com/documents/10.2.0/web/howto/client.html

Categories