Strange problems when using requests and multiprocessing - python

Please check this python code:
#!/usr/bin/env python
import requests
import multiprocessing
from time import sleep, time
from requests import async
def do_req():
r = requests.get("http://w3c.org/")
def do_sth():
while True:
sleep(10)
if __name__ == '__main__':
do_req()
multiprocessing.Process( target=do_sth, args=() ).start()
When I press Ctrl-C (wait 2sec after run - let Process run), it doesn't stop. When I change the import order to:
from requests import async
from time import sleep, time
it stops after Ctrl-C. Why it doesn't stop/kill in first example?
It's a bug or a feature?
Notes:
Yes I know, that I didn't use async in this code, this is just stripped down code. In real code I use it. I did it to simplify my question.
After pressing Ctrl-C there is a new (child) process running. Why?
multiprocessing.__version__ == 0.70a1, requests.__version__ == 0.11.2, gevent.__version__ == 0.13.7

Requests async module uses gevent. If you look at the source code of gevent you will see that it monkey patches many of Python's standard library functions, including sleep:
request.async module during import executes:
from gevent import monkey as curious_george
# Monkey-patch.
curious_george.patch_all(thread=False, select=False)
Looking at the monkey.py module of gevent you can see:
https://bitbucket.org/denis/gevent/src/f838056c793d/gevent/monkey.py#cl-128
def patch_time():
"""Replace :func:`time.sleep` with :func:`gevent.sleep`."""
from gevent.hub import sleep
import time
patch_item(time, 'sleep', sleep)
Take a look at the code from the gevent's repository for details.

Related

How do I 'spam' an url via python?

I'd want to make a script that automatically sends alot of requests to a URL via Python.
Example link: https://page-views.glitch.me/badge?page_id=page.id
I've tried selenium but thats very slow.
pip install requests
import requests
for i in range(100): # Or whatever amount of requests you wish to send
requests.get("https://page-views.glitch.me/badge?page_id=page.id")
Or if you really wanted to hammer the address you could use multiprocessing
import multiprocessing as mp
import requests
def my_func(x):
for i in range(x):
print(requests.get("https://page-views.glitch.me/badge?page_id=page.id"))
def main():
pool = mp.Pool(mp.cpu_count())
pool.map(my_func, range(0, 100))
if __name__ == "__main__":
main()
You can send send multiple get() requests in a loop as follows:
for i in range(100):
driver.get(https://page-views.glitch.me/badge?page_id=page.id)

Passing Variables to a process python

Need help with how to modify/fix code to allow me to control what is occurring in a process. I have looked around and read I need to either make a global variable which the process can read or use an event function to trigger the process. Problem though is I don't know how to implement them in a class function. I thought that if I followed pyimagesearch code that it would work but it appears that it only works with the threading module and not the multiprocessing module.
import RPi.GPIO as GPIO
from RPI.GPIO import LOW,OUT,HIGH,BCM
import multiprocessing as mp
import time
class TestClass():
def __init__(self,PinOne=22,PinTwo=27):
self.PinOne = PinOne
self.PinTwo = PinTwo
self.RunningSys = True
GPIO.setmode(BCM)
GPIO.setup(PinOne,OUT)
GPIO.output(PinOne,LOW)
GPIO.setup(PinTwo,OUT)
GPIO.output(PinTwo,LOW)
def Testloop(self):
while self.RunningSys:
GPIO.output(PinOne,HIGH)
GPIO.output(PinTwo,HIGH)
time.sleep(1)
GPIO.output(PinOne,LOW)
GPIO.output(PinTwo,LOW)
GPIO.output(PinOne,LOW)
GPIO.output(PinTwo,LOW)
def StopPr(self):
self.RunningSys = False
def MProc(self):
MPGP = mp.process(target=TestClass().Testloop())
MPGP.start()
MPGP.join()
In a separate script
From testfile import TestClass
import time
TestClass().MProc()
time.sleep(4)
TestClass().StopPr()

django_apscheduler remove_all_jobs on django startup

I have used django_apscheduler to schedule jobs. And it's working fine. When I start server new job is added and periodically it's doing what I need to do. However if I exit django and start it again then django will fail with error.
apscheduler.jobstores.base.ConflictingIdError: u'Job identifier (myapp_db.jobs.test_job) conflicts with an existing job'
Basically old job exists in database and new job can not be created.
How can I remove all jobs during django startup.
I notice there is remove_all_job() function in apscheduler but I do not know from where to execute it?
I'm starting job.py from url.py with
import myapp.jobs
Thanks.
code:
import time
import sys
import requests
from bs4 import BeautifulSoup
import re
from kuce_db.models import NjuskaloData, UserVisitedData
import logging
from django.contrib.auth.models import User
from apscheduler.schedulers.background import BackgroundScheduler
from django_apscheduler.jobstores import DjangoJobStore, register_events, register_job
scheduler = BackgroundScheduler()
scheduler.add_jobstore(DjangoJobStore(), "default")
def get_trailing_number(s):
m = re.search(r'\d+$', s)
return int(m.group()) if m else None
#register_job(scheduler, "interval", seconds=300)
def test_job():
print("I'm a test job!")
register_events(scheduler)
#
scheduler.start()
print("Scheduler started!")
logging.basicConfig()
use replace_existing flag to tell it replace existed job
#register_job(scheduler, "interval", seconds=300, replace_existing=True)
def test_job():
print("I'm a test job!")

python 3 - gevent threads not running in parallel

I'm trying to use the gevent greenlets to create some long running threads, like workers, but I've a problem when asking input to the user, ex:
import gevent
import time
from gevent import monkey
monkey.patch_all()
def func():
while True:
print('working')
time.sleep(2)
def func2():
while True:
print(input('input: '))
gevent.joinall([
gevent.spawn(func),
gevent.spawn(func2)
])
This will run once func, and then will wait for user input, but stops to print working.
Tried also to wait input on the main thread:
from gevent import Greenlet
import time
from gevent import monkey
monkey.patch_all()
def func():
while True:
print('working')
time.sleep(2)
g = Greenlet(func)
g.start()
while True:
print(input('input: '))
But there the greenlet is not starting. Where I'm wrong?
Thanks in advance.

Python Event loop w/ gevent

import gevent
from gevent.event import AsyncResult
import time
class Job(object):
def __init__(self, name):
self.name = name
def setter(job):
print 'starting'
gevent.sleep(3)
job.result.set('%s done' % job.name)
def waiter(job):
print job.result.get()
# event loop
running = []
for i in range(5):
print 'creating'
j = Job(i)
j.result = AsyncResult()
running.append(gevent.spawn(setter, j))
running.append(gevent.spawn(waiter, j))
print 'started greenlets, event loop go do something else'
time.sleep(5)
gevent.joinall(running)
gevent doesnt actually start until joinall is called
Is there something that would start/spawn gevent asynchronously (why does it not start right away as soon as spawn is called)?
Is there a select/epoll on running greenlets to see which one needs to be joined instead of joinall()?
No, it does not start straight away. It will start as soon as your main greenlet yields to the hub (releases control by calling sleep or join for example)
Clearly your intention is that it starts when you call time. It does not, because you have not monkey patched it.
Add these lines to the very top of your file:
from gevent import monkey
monkey.patch_all()
This will then have the behaviour that you want (because under the hood, time will be modified to yield to the hub).
Alternatively, you can call gevent.sleep.
Since you did not monkey patch, time.sleep() is causing your app to pause. Use gevent.sleep(5) instead.
The very first step should be monkey patching
from gevent import monkey;
monkey.patch_all()
This will spawn the greenlets asynchronously.

Categories