I'm using the Unirest library for making async web requests with Python. I've read the documentation, but I wasn't able to find if I can use proxy with it. Maybe I'm just blind and there's a way to use it with Unirest?
Or is there some other way to specify proxy for Python? Proxies should be changed from script itself after making some requests, so this way should allow me to do it.
Thanks in advance.
I know nothing about Unirest, but, In all the scripts I wrote that requierd proxy support I used SocksiPy (http://socksipy.sourceforge.net) module. It support HTTP, SOCKS4 and SOCKS5 and it s really easy to use. :)
Would something like this work for you?
[1] https://github.com/obriencj/python-promises
Related
I am basically trying to start an HTTP server which will respond with content from a website which I can crawl using Scrapy. In order to start crawling the website I need to login to it and to do so I need to access a DB with credentials and such. The main issue here is that I need everything to be fully asynchronous and so far I am struggling to find a combination that will make everything work properly without many sloppy implementations.
I already got Klein + Scrapy working but when I get to implementing DB accesses I get all messed up in my head. Is there any way to make PyMongo asynchronous with twisted or something (yes, I have seen TxMongo but the documentation is quite bad and I would like to avoid it. I have also found an implementation with adbapi but I would like something more similar to PyMongo).
Trying to think things through the other way around I'm sure aiohttp has many more options to implement async db accesses and stuff but then I find myself at an impasse with Scrapy integration.
I have seen things like scrapa, scrapyd and ScrapyRT but those don't really work for me. Are there any other options?
Finally, if nothing works, I'll just use aiohttp and instead of Scrapy I'll do the requests to the websito to scrap manually and use beautifulsoup or something like that to get the info I need from the response. Any advice on how to proceed down that road?
Thanks for your attention, I'm quite a noob in this area so I don't know if I'm making complete sense. Regardless, any help will be appreciated :)
Is there any way to make pymongo asynchronous with twisted
No. pymongo is designed as a synchronous library, and there is no way you can make it asynchronous without basically rewriting it (you could use threads or processes, but that is not what you asked, also you can run into issues with thread-safeness of the code).
Trying to think things through the other way around I'm sure aiohttp has many more options to implement async db accesses and stuff
It doesn't. aiohttp is a http library - it can do http asynchronously and that is all, it has nothing to help you access databases. You'd have to basically rewrite pymongo on top of it.
Finally, if nothing works, I'll just use aiohttp and instead of scrapy I'll do the requests to the websito to scrap manually and use beautifulsoup or something like that to get the info I need from the response.
That means lots of work for not using scrapy, and it won't help you with the pymongo issue - you still have to rewrite pymongo!
My suggestion is - learn txmongo! If you can't and want to rewrite it, use twisted.web to write it instead of aiohttp since then you can continue using scrapy!
Recently I decided to port a project from using requests to using grequests to make asynchronous HTTP Requests instead for efficiency. One problem I face now
is that in my previous code, in cases where there was a redirect I could handle it using the following snippet:
req = requests.get('a domain')
if req.history:
print req.url # that would be the last domain
In this way I could retrieve the last-visited url after the redirect.
So my question is if it's possible to achieve something similar with grequests.
Reading the docs I didn't manage to find a possible solution.
Thanks in advance!
I have written a python server that does a task depending on the input given by the user through a client. Unfortunately, this requires the user to use the terminal.
I'd like the user to use a browser instead to send the data to the server. How would I go on about this?
Does anyone here have suggestions? Perhaps even an example?
Thank you all in advance,
This is a very subjective question and depends on what exactly you are trying to achieve but if you want to write a program with an embedded http server then you could use either Tornado or Twisted. I've spent some time with both and found that Tornado is a bit cleaner and easier to write a web api with, but Twisted is more versatile if you want to handle different types of network connections.
Answering my question for future reference or other people with similar requests.
All of the requirements for this can be found in the standard module BaseHTTPServer
I'm working a script that will upload videos to YouTube with different accounts. Is there a way to use HTTPS or SOCKS proxies to filter all the requests. My client doesn't want to leave any footprints for Google. The only way I found was to set the proxy environment variable beforehand but this seems cumbersome. Is there some way I'm missing?
Thanks :)
Setting an environment variable (e.g. import os; os.environ['BLAH']='BLUH' once at the start of your program "seems cumbersome"?! What does count as "non-cumbersome" for you, pray?
What would be the best library for multithreaded harvesting/downloading with multiple proxy support? I've looked at Tkinter, it looks good but there are so many, does anyone have a specific recommendation? Many thanks!
Twisted
Is this something you can't just do by passing a URL to newly spawned threads and calling urllib2.urlopen in each one, or is there a more specific requirement?
Also take a look at http://scrapy.org/, which is a scraping framework built on top of twisted.