handle url-redirects in asynchronous code

handle url-redirects in asynchronous code - python

Recently I decided to port a project from using requests to using grequests to make asynchronous HTTP Requests instead for efficiency. One problem I face now
is that in my previous code, in cases where there was a redirect I could handle it using the following snippet:
req = requests.get('a domain')
if req.history:
print req.url # that would be the last domain
In this way I could retrieve the last-visited url after the redirect.
So my question is if it's possible to achieve something similar with grequests.
Reading the docs I didn't manage to find a possible solution.
Thanks in advance!

Related

Async HTTP server with scrapy and mongodb in python

I am basically trying to start an HTTP server which will respond with content from a website which I can crawl using Scrapy. In order to start crawling the website I need to login to it and to do so I need to access a DB with credentials and such. The main issue here is that I need everything to be fully asynchronous and so far I am struggling to find a combination that will make everything work properly without many sloppy implementations.
I already got Klein + Scrapy working but when I get to implementing DB accesses I get all messed up in my head. Is there any way to make PyMongo asynchronous with twisted or something (yes, I have seen TxMongo but the documentation is quite bad and I would like to avoid it. I have also found an implementation with adbapi but I would like something more similar to PyMongo).
Trying to think things through the other way around I'm sure aiohttp has many more options to implement async db accesses and stuff but then I find myself at an impasse with Scrapy integration.
I have seen things like scrapa, scrapyd and ScrapyRT but those don't really work for me. Are there any other options?
Finally, if nothing works, I'll just use aiohttp and instead of Scrapy I'll do the requests to the websito to scrap manually and use beautifulsoup or something like that to get the info I need from the response. Any advice on how to proceed down that road?
Thanks for your attention, I'm quite a noob in this area so I don't know if I'm making complete sense. Regardless, any help will be appreciated :)

Is there any way to make pymongo asynchronous with twisted
No. pymongo is designed as a synchronous library, and there is no way you can make it asynchronous without basically rewriting it (you could use threads or processes, but that is not what you asked, also you can run into issues with thread-safeness of the code).
Trying to think things through the other way around I'm sure aiohttp has many more options to implement async db accesses and stuff
It doesn't. aiohttp is a http library - it can do http asynchronously and that is all, it has nothing to help you access databases. You'd have to basically rewrite pymongo on top of it.
Finally, if nothing works, I'll just use aiohttp and instead of scrapy I'll do the requests to the websito to scrap manually and use beautifulsoup or something like that to get the info I need from the response.
That means lots of work for not using scrapy, and it won't help you with the pymongo issue - you still have to rewrite pymongo!
My suggestion is - learn txmongo! If you can't and want to rewrite it, use twisted.web to write it instead of aiohttp since then you can continue using scrapy!

Accessing MongoDB Database on nodejs server using HTTP GET requests

I have a nodejs server setup on AWS with mongoDB. I want to access the database contents using GET method. There is another application in python which needs to access this database present on AWS. I searched on the internet and came across PycURL but I am not getting how to use it exactly. How to approach with pycURL or what can be an alternate solution?

You can build your restful API that is going to handle those GET requests. You have awesome tutorial (with example that you want on bottom):
https://scotch.io/tutorials/build-a-restful-api-using-node-and-express-4
Edit: If you want phyton code for GET requests there is awesome answer here: Simple URL GET/POST function in Python
Edit 2: Example of how would this work. You first need to code your API how to handle GET request and on what route (example: http://localhost:5000/api/getUsers). Than you want to make GET request to that route using Phyton:
Example:
r = requests.get(url="http://localhost:5000/api/getUsers")

I had a similar problem a while ago, there is a tutorial here here. It can lead you towards your intended direction, the drawback may be that in the tutorial, to issue the http request (if I remember correctly), they used postman but I'm sure you can still use PyCurl.

Which response goes with which request?

I'm using Python with OpenERP 7.
I fire requests to partners using urllib3. Some of these requests may be asynchronous.
So I've built a little asyncore server to wait for responses.
But the thing is, the server cannot know which response goes with which request.
In the content of my request, I have a tag named TransactionID.
So far, I tried to link responses with requests using this TransactionID.
But the responses are not the same from one partner to another.
So, what I've done is create a list of possible TransactionID tag structures.
This method works, but it's so ugly.
I was hopping for a better, cleaner solution, if someone knows how to achieve that.
Thanks !
Edit:
I've made a mistake by calling it asynchronous I think.
The partner gives a synchronous response. But it's just to confirm that my request is ok.
Later, the partner will send me a response on a specific url:port on my server. This is the response I'm talking about. Sorry if I've not given enough details.

API GET requests from Specific IP - Requests Library - Python

I'm looking to switch existing PHP code over to Python using the Requests library. The PHP code sends thousands of GET requests to an API to get needed data. The API limits GET requests to one every 6 seconds per IP. We have numerous IP addresses in order to pull faster. The faster the better in this case.
My question is is there a way to send the GET request from different IP addresses using the Requests library? I'm also open to using different libraries in Python or different methods to replace the IP addresses.
The current code makes use of curl_multi_exec with the CURLOPT_INTERFACE setting.
As far as code goes, I don't necessarily need code examples. I'm looking for more of a direction or option that will allow such features in Python. I would prefer not post code, but if its necessary, let me know.
Thanks!

I don't believe Requests supports setting the outbound interface.
There is a Python cURL library, though.

Python: What's the difference between httplib2 and urllib2?

I'm trying to implement an OAuth2 authentication server and for the client part i wanted to send a json request to the server (from a Django view) and i found several libraries to do that tho' the most common are httplib2 and urllib2 i was wondering which is the difference between them and which is the best library for this purpose.
Thanks in advance.
Edit:
After searching, i found an extremely useful library called Requests and i use this one since then. (http://docs.python-requests.org/en/latest/)

urllib2 handles opening and reading URLs. It also handles extra stuff like storing cookies.
httplib handles http requests, its what happens behind the curtain when you open a url.
you can send json request with urllib2 so you should use that.
see this.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.