Why gevent can speed up requests to download?

Why gevent can speed up requests to download? - python

I think requests.get should be block, so there should be no difference between run and run2.
import sys
import gevent
import requests
from gevent import monkey
monkey.patch_all()
def download():
requests.get('http://www.baidu.com').status_code
def run():
ls = [gevent.spawn(download) for i in range(100)]
gevent.joinall(ls)
def run2():
for i in range(100):
download()
if __name__ == '__main__':
from timeit import Timer
t = Timer(stmt="run();", setup="from __main__ import run")
print('good', t.timeit(3))
t = Timer(stmt="run2();", setup="from __main__ import run2")
print('bad', t.timeit(3))
sys.exit(0)
but result is:
good 5.006664161000117
bad 29.077525214999696
so are there all kind read, write could be speed up by gevent?
PS: I run it on mac/python3/requests 2.10.0/gevent 1.1.2

From the gevent website:
Fast event loop based on libev (epoll on Linux, kqueue on FreeBSD).
Lightweight execution units based on greenlet.
API that re-uses concepts from the Python standard library (for example there are gevent.event.Events and gevent.queue.Queues).
Cooperative sockets with SSL support
DNS queries performed through threadpool or c-ares.
Monkey patching utility to get 3rd party modules to become cooperative
Basically, just for looping a bunch of requests.get() calls is slow due to the fact that you're, well, for looping through a bunch of requests.get() calls. Gevent'ing a bunch of requests.get() calls isn't slow due to the fact that you're throwing those calls into a threaded Queue instead, which then uses gevent's powerful API to run through those calls incredibly efficiently.

Related

How does gevent ensure that the same thread-local variables are not shared between multiple coroutines

I have a Python 2 django project, which was started with gunicorn, and write a lot of threading.currentThread().xxxxxx ='some value' in the code.
Because the coroutine reuses the same thread, I am curious how gevent guarantees that the currentThread variable created in coroutine A(Thread 1) will not affect coroutine B (same Thread 1).
After all, the writing on the code is:
import threading
threading.currentThread().xxxxx ='ABCD'
Instead of
import gevent
gevent.currentCoroutine().xxxxx ='ABCD' (simulate my guess)
thanks for your help

It doesn't as far as I'm aware. Normal Gevent coroutines run in the same thread - if you modify something on that thread in one coroutine, it will be modified in the other coroutine as well.
If this is a question about gunicorn, that's a different matter and the following answer has some great detail on that - https://stackoverflow.com/a/41696500/7970018.

You should create threading.localin mainThread.
After monkey patching, Gevent patched gevent.local = threading.local so you can save data in current via:
import threading
threadlocal = threading.local()
def func_in_thread():
# set data
setattr(threadlocal, "_key", "_value")
# do something
# do something
getattr(threadlocal, "_key", None)

asyncio and gevent loops - how to communicate between each other?

I'm trying to build bridge between two protocols based on existing libraries, basically do something based on event (like transmit message, or announce it). The problem is that one library is using Gevent loop and the other is using Asyncio loop, so I'm not able to use built-in loop functionality to do signal/event actions on the other loop, and basically no way to access the other loop.
How to setup event-based communication between them? I can't seem to access the other loop from within existing one. I feel like overthinking.
Is there some way to do it via multithreading by sharing objects between loops?
Sample code:
import libraryBot1
import libraryBot2
bot1 = libraryBot1.Client()
bot2 = libraryBot2.Client()
#bot1.on('chat_message')
def handle_message(user, message_text):
bot2.send(message_text)
#bot2.on('send')
def handle_message(message_text):
print(message_text)
if __name__ == "__main__"
# If I login here, then its run_forever on behind the scenes
# So I cant reach second connection
bot1.login(username="username", password="password")
# Never reached
bot2.login(username="username", password="password")
If I on the other side try to use multithreading, then both of them are started, but they can't access each other (communicate).

Here is an example using only gevent. It might be possible to wrap the greenlets in such a way that it would be compatible with asyncio:
import gevent
from gevent.pool import Pool
from gevent.event import AsyncResult
a = AsyncResult()
pool = Pool(2)
def shared(stuff):
print(stuff)
pool.map(bot1.login, username="username", password="password", event=a, shared=shared)
pool.map(bot2.login, username="username", password="password", event=a, shared=shared)
# and then in both you could something like this
if event.get() == 'ready':
shared('some other result to share')
related:
deleted from pypi https://pypi.python.org/pypi/aiogevent/0.2
see ( https://github.com/gevent/gevent/issues/982 )
http://sdiehl.github.io/gevent-tutorial/#events

Running twisted reactor in iPython

I'm aware this is normally done with twistd, but I'm wanting to use iPython to test out code 'live' on twisted code.
How to start twisted's reactor from ipython asked basically the same thing but the first solution no longer works with current ipython/twisted, while the second is also unusable (thread raises multiple errors).
https://gist.github.com/kived/8721434 has something called TPython which purports to do this, but running that seems to work except clients never connect to the server (while running the same clients works in the python shell).
Do I have to use Conch Manhole, or is there a way to get iPython to play nice (probably with _threadedselect).
For reference, I'm asking using ipython 5.0.0, python 2.7.12, twisted 16.4.1

Async code in general can be troublesome to run in a live interpreter. It's best just to run an async script in the background and do your iPython stuff in a separate interpreter. You can intercommunicate using files or TCP. If this went over your head, that's because it's not always simple and it might be best to avoid the hassle of possible.
However, you'll be happy to know there is an awesome project called crochet for using Twisted in non-async applications. It truly is one of my favorite modules and I'm shocked that it's not more widely used (you can change that ;D though). The crochet module has a run_in_reactor decorator that runs a Twisted reactor in a separate thread managed by crochet itself. Here is a quick class example that executes requests to a Star Wars RESTFul API, then stores the JSON response in a list.
from __future__ import print_function
import json
from twisted.internet import defer, task
from twisted.web.client import getPage
from crochet import run_in_reactor, setup as setup_crochet
setup_crochet()
class StarWarsPeople(object):
people_id = [_id for _id in range(1, 89)]
people = []
#run_in_reactor
def requestPeople(self):
"""
Request Star Wars JSON data from the SWAPI site.
This occurs in a Twisted reactor in a separate thread.
"""
for _id in self.people_id:
url = 'http://swapi.co/api/people/{0}'.format(_id).encode('utf-8')
d = getPage(url)
d.addCallback(self.appendJSON)
def appendJSON(self, response):
"""
A callback which will take the response from the getPage() request,
convert it to JSON, then append it to self.people, which can be
accessed outside of the crochet thread.
"""
response_json = json.loads(response.decode('utf-8'))
#print(response_json) # uncomment if you want to see output
self.people.append(response_json)
Save this in a file (example: swapi.py), open iPython, import the newly created module, then run a quick test like so:
from swapi import StarWarsPeople
testing = StarWarsPeople()
testing.requestPeople()
from time import sleep
for x in range(5):
print(len(testing.people))
sleep(2)
As you can see it runs in the background and stuff can still occur in the main thread. You can continue using the iPython interpreter as you usually do. You can even have a manhole running in the background for some cool hacking too!
References
https://crochet.readthedocs.io/en/1.5.0/introduction.html#crochet-use-twisted-anywhere

While this doesn't answer the question I thought I had, it does answer (sort of) the question I posted. Embedding ipython works in the sense that you get access to business objects with the reactor running.
from twisted.internet import reactor
from twisted.internet.endpoints import serverFromString
from myfactory import MyFactory
class MyClass(object):
def __init__(self, **kwargs):
super(MyClass, self).__init__(**kwargs)
server = serverFromString(reactor, 'tcp:12345')
server.list(MyFactory(self))
def interact():
import IPython
IPython.embed()
reactor.callInThread(interact)
if __name__ == "__main__":
myclass = MyClass()
reactor.run()
Call the above with python myclass.py or similar.

Understanding Asynchronous/Multiprocessing in Python

Lets say I have a function:
from time import sleep
def doSomethingThatTakesALongTime(number):
print number
sleep(10)
and then I call it in a for loop
for number in range(10):
doSomethingThatTakesALongTime(number)
How can I set this up so that it only takes 10 seconds TOTAL to print out:
$ 0123456789
Instead of taking 100 seconds. If it helps, I'm going to use the information YOU provide to do asynchronous web scraping. i.e. I have a list of sites I want to visit, but I want to visit them simultaneously, rather than wait for each one to complete.

Try to use Eventlet — the first example of documentation shows how to implement simultaneous URL fetching:
urls = ["http://www.google.com/intl/en_ALL/images/logo.gif",
"https://wiki.secondlife.com/w/images/secondlife.jpg",
"http://us.i1.yimg.com/us.yimg.com/i/ww/beta/y3.gif"]
import eventlet
from eventlet.green import urllib2
def fetch(url):
return urllib2.urlopen(url).read()
pool = eventlet.GreenPool()
for body in pool.imap(fetch, urls):
print "got body", len(body)
I can also advise to look toward Celery for more flexible solution.

asyncoro supports asynchronous, concurrent programming. It includes asynchronous (non-blocking) socket implementation. If your implementation does not need urllib/httplib etc. (that don't have asynchronous completions), it may fit your purpose (and easy to use, as it is very similar to programming with threads). Your above problem with asyncoro:
import asyncoro
def do_something(number, coro=None):
print number
yield coro.sleep(10)
for number in range(10):
asyncoro.Coro(do_something, number)

Take a look at scrapy framework. It's intended specially for web scraping and is very good. It is asynchronus and built on twisted framework.
http://scrapy.org/

Just in case, this is the exact way to apply green threads to your example snippet:
from eventlet.green.time import sleep
from eventlet.greenpool import GreenPool
def doSomethingThatTakesALongTime(number):
print number
sleep(10)
pool = GreenPool()
for number in range(100):
pool.spawn_n(doSomethingThatTakesALongTime, number)
import timeit
print timeit.timeit("pool.waitall()", "from __main__ import pool")
# yields : 10.9335260363

python multiprocess caller (as well as callee) invoked multiple times on windows XP [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Multiprocessing launching too many instances of Python VM
I'm trying to use python multiprocess to parallelize web fetching, but I'm finding that the application calling the multiprocessing gets instantiated multiple times, not just the function I want called (which is a problem for me as the caller has some dependencies on a library that is slow to instantiate - losing most of my performance gains from parallelism).
What am I doing wrong or how is this avoided?
my_app.py:
from url_fetcher import url_fetch, parallel_fetch
import my_slow_stuff
my_slow_stuff.py:
if __name__ == '__main__':
import datetime
urls = ['http://www.microsoft.com'] * 20
results = parallel_fetch(urls, fn=url_fetch)
print([x[:20] for x in results])
class MySlowStuff(object):
import time
print('doing slow stuff')
time.sleep(0)
print('done slow stuff')
url_fetcher.py:
import multiprocessing
import urllib
def url_fetch(url):
#return urllib.urlopen(url).read()
return url
def parallel_fetch(urls, fn):
PROCESSES = 10
CHUNK_SIZE = 1
pool = multiprocessing.Pool(PROCESSES)
results = pool.imap(fn, urls, CHUNK_SIZE)
return results
if __name__ == '__main__':
import datetime
urls = ['http://www.microsoft.com'] * 20
results = parallel_fetch(urls, fn=url_fetch)
print([x[:20] for x in results])
partial output:
$ python my_app.py
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
...

Python multiprocessing module for Windows behaves slightly differently because Python doesn't implement os.fork() on this platform. In particular:
Safe importing of main module
Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).
Here, global class MySlowStuff gets always evaluated by newly started child processes on Windows. To fix that class MySlowStuff should be defined only when __name__ == '__main__'.
See 16.6.3.2. Windows for more details.

The multiprocessing module on windows doesn't work the same as in Unix/Linux. On Linux it uses the fork command and all the context is copied/duplciated to the new pocess as it is when forked.
The system call fork does not exsit on windows, and the multiprocessing module has to create a new python process and load all the modules again, this is the reason why on the python lib documetnacion forces you to user the if __name__ == '__main__' trick when using mutiprocessing on windows.
The solution to this case is to use threads instead. This case is a IO bound process and you the advantage os multiprocessing that is avoiding GIL problems does not afect you.
More info in http://docs.python.org/library/multiprocessing.html#windows

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why gevent can speed up requests to download? - python

Related

How does gevent ensure that the same thread-local variables are not shared between multiple coroutines

asyncio and gevent loops - how to communicate between each other?

Running twisted reactor in iPython

Understanding Asynchronous/Multiprocessing in Python

python multiprocess caller (as well as callee) invoked multiple times on windows XP [duplicate]

Categories

Resources