Having issues passing counts in my scheduler function - python

I'm working with the schedule package built by Dan Bader and I'm not sure how to pass a counter into the package.
Here is the basic example of how this is used:
def job(message="stuff"):
print("I'm working on:", str(message))
schedule.every(10).seconds.do(job)
while True:
schedule.run_pending()
time.sleep(1)
So this prints I'm working on: stuff every 10 seconds which is great but what its missing for my application is a counter. I'd like to count every time a job is run and pass that into my function but I haven't been able to figure that out.
This was my latest attempt:
def job(count = 0):
print(count)
count = 0
schedule.every(10).seconds.do(job, count)
while True:
schedule.run_pending()
time.sleep(1)
count += 1
I thought that having count as a parameter and looping it within the while loop would work but it did not.
At the end of the day, I need a platform that will allow me to constantly run a function at some interval and keep track of how many times the job has run.
Any help would be much appreciated

Well, not sure it's the more obvious way, but I would do something like
import schedule
import time
class Count(object):
def __init__(self):
self._count = 0
def __str__(self):
count = self._count
self._count += 1
return str(count)
def job(count):
print(count)
count = Count()
schedule.every(1).seconds.do(job, count)
while True:
schedule.run_pending()
Count's instance just increment itself when printed, so if you are sure to print it only in job you keep track of how many times it was ran.
The only real pro VS the dict solution is you don't have to compute anything in job - count handles itself alone.
 Edit
Since some SO folks seems to like my idea even if (and that's clearly true) it hide some functional aspects of the counter (nobody expect to handle an instance that increment an integer when printed) here is a little update that makes it less tricky, less confusing, in brief: better.
class CountLogger(object):
def __init__(self):
self._count = 0
def log(self, message = ''):
print('{}: {}'.format(self._count, message))
self._count += 1
It does almost the exact same thing, but calling CounterLogger.log() to see an integer getting incremented is less puzzling.
Just replace print(count) by count.log().

You can use a mutable object like a dictionary to achieve what you want. (Your original count variable is an integer, which is NOT mutable in Python.) The following works:
import schedule
import time
store = {'count': 0}
def job(data):
data['count'] += 1
print(data['count'])
schedule.every(10).seconds.do(job, store)
while True:
schedule.run_pending()
time.sleep(1)

Related

Using multiprocessing with a dictionary that needs locked

I am trying to use Python's multiprocessing library to speed up some code I have. I have a dictionary whose values need to be updated based on the result of a loop. The current code looks like this:
def get_topic_count():
topics_to_counts = {}
for news in tqdm.tqdm(RawNews.objects.all().iterator()):
for topic in Topic.objects.filter(is_active=True):
if topic.name not in topics_to_counts.keys():
topics_to_counts[topic.name] = 0
if topic.name.lower() in news.content.lower():
topics_to_counts[topic.name] += 1
for key, value in topics_to_counts.items():
print(f"{key}: {value}")
I believe the worker function should look like this:
def get_topic_count_worker(news, topics_to_counts, lock):
for topic in Topic.objects.filter(is_active=True):
if topic.name not in topics_to_counts.keys():
lock.acquire()
topics_to_counts[topic.name] = 0
lock.release()
if topic.name.lower() in news.content.lower():
lock.acquire()
topics_to_counts[topic.name] += 1
lock.release()
However, I'm having some trouble writing the main function. Here's what I have so far but I keep getting a process killed message I believe it's using too much memory.
def get_topic_count_master():
topics_to_counts = {}
raw_news = RawNews.objects.all().iterator()
lock = multiprocessing.Lock()
args = []
for news in tqdm.tqdm(raw_news):
args.append((news, topics_to_counts, lock))
with multiprocessing.Pool() as p:
p.starmap(get_topic_count_worker, args)
for key, value in topics_to_counts.items():
print(f"{key}: {value}")
Any guidance here would be appreciated!
Update: There are about 1.6 million records that it needs to go through. How would I chunk this properly?
Update 2: Here's some sample data:
Update 3:
Here is the relation in the RawNews table:
topics = models.ManyToManyField('Topic', blank=True)
The problem was related to a restriction on the database. It speeds up the process to multithread, but the database has a restriction of 100 pings at a single time. Either increase this connection or max out the number of threads at any given time to a number less than 100.

Python SimPy: yield env.timeout not working

I'm just getting started with SimPy so maybe I'm missing something major here. I have a very simple Process which I want to just increment a number once per second.
class Simulation:
def __init__(self):
self.env = simpy.Environment()
thing = Thing()
p = self.env.process(thing.go(self.env))
self.env.run()
simulation = Simulation()
class Thing():
def __init__(self):
self.x = 1
def go(self,env):
while True:
self.x = self.x + 1
print("Current value: {}".format(self.x))
yield env.timeout(1)
The timeout seems to be getting ignored here. It doesn't matter what value I pass as the delay, or whether I include the line at all, x increases at the same rate(over 1 million within seconds)
You seem to be confusing real-time with sim-time. Have your print statement include the current time (env.now). You should get something like this:
Current value: 2 , time now: 0
Current value: 3 , time now: 1
Current value: 4 , time now: 2
Current value: 5 , time now: 3
:
So, you can see the simulation clock is incrementing as is should; the simulation is just really fast. If you want to run in real time, don't use simpy.environment(). Instead, use simpy.rt.RealTimeEnvironment().
See here for more details:
https://simpy.readthedocs.io/en/latest/topical_guides/real-time-simulations.html

Conditionally increase integer count with an if statement in python

I'm trying to increase the count of an integer given that an if statement returns true. However, when this program is ran it always prints 0.I want n to increase to 1 the first time the program is ran. To 2 the second time and so on.
I know functions, classes and modules you can use the global command, to go outside it, but this doesn't work with an if statement.
n = 0
print(n)
if True:
n += 1
Based on the comments of the previous answer, do you want something like this:
n = 0
while True:
if True: #Replace True with any other condition you like.
print(n)
n+=1
EDIT:
Based on the comments by OP on this answer, what he wants is for the data to persist or in more precise words the variable n to persist (Or keep it's new modified value) between multiple runs times.
So the code for that goes as(Assuming Python3.x):
try:
file = open('count.txt','r')
n = int(file.read())
file.close()
except IOError:
file = open('count.txt','w')
file.write('1')
file.close()
n = 1
print(n)
n += 1
with open('count.txt','w') as file:
file.write(str(n))
print("Now the variable n persists and is incremented every time.")
#Do what you want to do further, the value of n will increase every time you run the program
NOTE:
There are many methods of object serialization and the above example is one of the simplest, you can use dedicated object serialization modules like pickle and many others.
If you want it to work with if statement only. I think you need to put in a function and make to call itself which we would call it recursion.
def increment():
n=0
if True:
n+=1
print(n)
increment()
increment()
Note: in this solution, it would run infinitely.
Also you can use while loop or for loop as well.
When you rerun a program, all data stored in memory is reset. You need to save the variable somewhere outside of the program, on disk.
for an example see How to increment variable every time script is run in Python?
ps. Nowadays you can simply do += with a bool:
a = 1
b = True
a += b # a will be 2

python schedule module to periodically run a function

Goal: run a function every day at a randomized time between two times.
So, I wrote this function to randomly generate a time (please offer feedback on how to streamline. Couldn't find this in an existing package - it MUST already exist...)
def gen_rand_time(low, high):
hour = np.random.randint(low, high)
minute = np.random.randint(1,59)
if minute < 10:
time = str(hour)+':'+str(0)+str(minute)
return time
else:
time = str(hour) + ':' + str(minute)
return time
Next I define the function I would like to run. Keeping it nice and simple.
def test(a):
print('TEST: ' + str(a))
Now I want to run this runction on a periodic basis. I use the schedule package.
def run_bot():
time1 = str(gen_rand_time(18,19))
print(time1)
schedule.every(1).days.at(time1).do(test('TEST WORKED'))
while True:
schedule.run_pending()
time.sleep(1)
run_bot()
when I run run_bot() and put in a time in the immediate future (say, 1 minute into the future), the test() function returns "TEST: TEST WORKED" without waiting for the specified random time.
You should probably try ... do(test,'TEST WORKED')
instead of ... do(test('TEST WORKED')), see this.
Besides, it seems that you cannot use the same value for low and high (I wonder if you actually tried what you posted).

Separating Progress Tracking and Loop Logic

Suppose i want to track the progress of a loop using the progress bar printer ProgressMeter (as described in this recipe).
def bigIteration(collection):
for element in collection:
doWork(element)
I would like to be able to switch the progress bar on and off. I also want to update it only every x steps for performance reasons. My naive way to do this is
def bigIteration(collection, progressbar=True):
if progressBar:
pm = progress.ProgressMeter(total=len(collection))
pc = 0
for element in collection:
if progressBar:
pc += 1
if pc % 100 = 0:
pm.update(pc)
doWork(element)
However, I am not satisfied. From an "aesthetic" point of view, the functional code of the loop is now "contaminated" with generic progress-tracking code.
Can you think of a way to cleanly separate progress-tracking code and functional code? (Can there be a progress-tracking decorator or something?)
It seems like this code would benefit from the null object pattern.
# a progress bar that uses ProgressMeter
class RealProgressBar:
pm = Nothing
def setMaximum(self, max):
pm = progress.ProgressMeter(total=max)
pc = 0
def progress(self):
pc += 1
if pc % 100 = 0:
pm.update(pc)
# a fake progress bar that does nothing
class NoProgressBar:
def setMaximum(self, max):
pass
def progress(self):
pass
# Iterate with a given progress bar
def bigIteration(collection, progressBar=NoProgressBar()):
progressBar.setMaximum(len(collection))
for element in collection:
progressBar.progress()
doWork(element)
bigIteration(collection, RealProgressBar())
(Pardon my French, er, Python, it's not my native language ;) Hope you get the idea, though.)
This lets you move the progress update logic from the loop, but you still have some progress related calls in there.
You can remove this part if you create a generator from the collection that automatically tracks progress as you iterate it.
# turn a collection into one that shows progress when iterated
def withProgress(collection, progressBar=NoProgressBar()):
progressBar.setMaximum(len(collection))
for element in collection:
progressBar.progress();
yield element
# simple iteration function
def bigIteration(collection):
for element in collection:
doWork(element)
# let's iterate with progress reports
bigIteration(withProgress(collection, RealProgressBar()))
This approach leaves your bigIteration function as is and is highly composable. For example, let's say you also want to add cancellation this big iteration of yours. Just create another generator that happens to be cancellable.
# highly simplified cancellation token
# probably needs synchronization
class CancellationToken:
cancelled = False
def isCancelled(self):
return cancelled
def cancel(self):
cancelled = True
# iterates a collection with cancellation support
def withCancellation(collection, cancelToken):
for element in collection:
if cancelToken.isCancelled():
break
yield element
progressCollection = withProgress(collection, RealProgressBar())
cancellableCollection = withCancellation(progressCollection, cancelToken)
bigIteration(cancellableCollection)
# meanwhile, on another thread...
cancelToken.cancel()
You could rewrite bigIteration as a generator function as follows:
def bigIteration(collection):
for element in collection:
doWork(element)
yield element
Then, you could do a great deal outside of this:
def mycollection = [1,2,3]
if progressBar:
pm = progress.ProgressMeter(total=len(collection))
pc = 0
for item in bigIteration(mycollection):
pc += 1
if pc % 100 = 0:
pm.update(pc)
else:
for item in bigIteration(mycollection):
pass
My approach would be like that:
The looping code yields the progress percentage whenever it changes (or whenever it wants to report it). The progress-tracking code then reads from the generator until it's empty; updating the progress bar after every read.
However, this also has some disadvantages:
You need a function to call it without a progress bar as you still need to read from the generator until it's empty.
You cannot easily return a value at the end. A solution would be wrapping the return value though so the progress method can determine if the function yielded a progress update or a return value. Actually, it might be nicer to wrap the progress update so the regular return value can be yielded unwrapped - but that'd require much more wrapping since it would need to be done for every progress update instead just once.

Categories