Doing hour long tasks in a non-blocking way. [Python: Reddit PRAW] - python

I'm creating a Reddit bot using PRAW (Python Reddit API Wrapper) for a subreddit I moderate that comments on new submissions requesting that the poster comments on their post in order to comply with the posting rules. If the poster has not commented on their post within 1 hour then the bot should remove the post. The sequence of events looks like this:
A post is made.
The bot comments on the post telling the poster that they have one hour to add a comment to the post.
An hour passes.
If the user has not commented on their post then the post is removed. Otherwise, no action is taken.
The problem I have is with waiting for one hour. I cannot use sleep() to block for one hour because the bot will need to process other posts that have been made in that time frame (i.e. posts are made every fifteen minutes but using sleep() for one hour would cause the bot to fall behind). I also don't think I can use polling as checking for submissions is blocks the thread. To elaborate, I am checking for new submissions using for submission in subreddit.stream.submissions(skip_existing=True): where subreddit.stream.submissions() is a generator/ stream that yields whenever someone submits a post to the subreddit (Documentation here).
So at this point, I'm completely lost on where to go. I need to create a new task for every post that is made that runs through steps 1 to 4 without blocking more identical tasks being made whenever a post is submitted. If you could provide a pointer on which direction to go or how I might do this I would be grateful. In case you missed it, I'm using Python.

You might want to use 'RQ' (Redis Task Queue). It will add a new dependency in your application but you will get what you want. You can refer to the Docs here.

For me this task looks like job for threading.Timer. Example usage
import threading
def do_action(x):
print(f'Doing {x}')
t1 = threading.Timer(30.0, do_action, ['A'])
t1.start()
t2 = threading.Timer(20.0, do_action, ['B'])
t2.start()
t3 = threading.Timer(10.0, do_action, ['C'])
t3.start()
will print Doing C, Doing B, Doing A with 10 second between each action.

Related

discord.py invites - approximate_presence_count API gradually becomes slower

TL/DR: I'm querying an invite link's approximate_presence_count every 10 seconds, and it gradually stops detecting presence changes over a very long period. How can I fix this?
Goal
I'm writing a discord bot which monitors the number of online (and other statuses) members in several large (>100 members) servers I'm a member of. The bot is not a member of any of the relevant servers, and should log the number of members every 10 seconds or so.
This is not an XY problem, I do not want the bot to be a member of the servers, and simply just want it to use approximate_presence_count from invite links.
Method
To do this, I've made permanent invite links to each of the servers, and I query their approximate_presence_count at 10-second intervals via a tasks.loop, logging those values to a text file.
Additionally, I have a small testing server in which I have several friends who log on and off, to test whether the member count is working.
All intents are enabled in the developer portal. This is not an intents-related issue.
Problem
During testing on my small testing server, whilst running the bot over approximately a 24-hour period, I noticed that it becomes slower and slower to detect changes in approximate_presence_count after one of my friends logs on or off discord. I've reproduced this on several different days. While there is some minor variation in the time for approximate_presence_count to update at any given time, presumably due to the discord backend having variable amounts of load, this trend is constant.
After about 20-24 hours, the approximate_presence_count becomes almost useless, rarely detecting any changes.
Expected result: Delay between logon/logoff and change in approximate_presence_count remains constant
Actual result: Delay between logon/logoff and change in approximate_presence_count gradually increases
What I've tried
In addition to the code below, I also tried not fetching the invites every single time logger loops, but this also did not work.
I can reproduce the info on more than one network, and on more than one machine.
Minimal reproducible example
The code below is extracted from the bot and should be only the relevant components. There may be mistakes in the extraction process, but the gist remains the same.
import discord
from discord.ext import tasks
TOKEN='removed'
INTENTS=discord.Intents.all()
links=['discord.gg/foobarbaz','discord.gg/fillertext']#real invites removed
client = discord.Client(intents=INTENTS)
#tasks.loop(seconds=10)
async def logger():
invites=[await client.fetch_invite(i,with_counts=True)for i in links]#invite objects
counts=[getattr(i,'approximate_presence_count')for i in invites]#presence counts
with open('logs.txt','a') as file:
file.write(datetime.datetime.today().strftime("%d/%m/%Y, %H:%M:%S ")+','.join(map(str,counts))+'\n')
#client.event
async def on_ready():
logger.start()
client.run(TOKEN)
Final notes
The expected delay in approximate_presence_count
In my testing when this issue does not occur, the delay between logon/logoff and the change in approximate_presence_count is between 5 and 40 seconds, with perhaps 1 in 100 being up to 60 seconds.
Number of invites tracked
The bot is tracking 6 invite links currently, so the frequency of requests to Discord is 0.6/second on average. Is this enough to cause a ratelimit, perhaps? As EricJin mentioned in the comments, this is unlikely.
This is most likely an issue on Discord's end as suggested in the comments. They have changed how their invites work. This includes how permanent invite links work. While the article doesn't state the internal changes, it does state how the new invites work on the actual app/site. The time of change to the invites lines up to about when this post was made (May 2022) but has had updates since.
It seems you may have run into an issue that was present when these changes were made, which seems to be fixed at least at the time of posting this answer.

Python asyncio bot with MongoDB

My experience in developing bots is very small and I recently had some problems... I need your help, options. This would be a great experience for me.
I need to make an asyncio Bot with automatic points for users (about 20000 users) every 10 minutes without lags and stopping the bot. My solution is to use Thread (library threading), but it takes a long time (about 20 minutes).
Question: Is there a better solution than this, since I'm pretty sure it's not the most efficient way to solve my problem?
while True:
# Thread for function to accrual points
pay_users = threading.Thread(
target=pay_function, args=[state, money])
# Thread start
pay_users.start()
while True:
# Check the end of thread
if pay_users.is_alive() is True:
await asyncio.sleep(5)
else:
# If Thread has been completed - join this Thread
pay_users.join()
break
# Every 10 minutes
await asyncio.sleep(600)
By the way, there is 1 other question that worries me. Let's say each user has a certain bonus that can be used(activated) every 24 hours. After the user has taken the bonus, he can click on the button and check how much time is left to restore the bonus. I want to make it so that when the user's ability to activate the bonus is restored, he will be informed about it. I thought about it and I came up with a trivial solution - at the moment of clicking on the bonus change the date field in the database to the date of the click, but I don't know how to make a notification with this solution about restoring the bonus.
Question: Is there any way to make an individual counter for each user, at the end of which to send a message to the function to notify the user about it?
Thank you very much in advance for your attention and trying to help me!

Tweepy api.list_direct_messages() updates slowly

Hello and thanks for taking the time to try to answer my question. I'll be as blunt, and specific as possible.
Using tweepy I'm trying to get the ID from the last message in my DMs by using this method
auth = tweepy.OAuthHandler(token[0], token[1])
auth.set_access_token(token[2], token[3])
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify= True)
last_dm = api.list_direct_messages(1)
for messages in last_dm:
print(messages.message_create['sender_id'])
if not (messages.message_create['sender_id'] == my_id):
send_message()
This works as expected, however something weird happens right after. If I were to run this program once it'd work, but if I run it again within 3 or so minutes it won't register that the sender ID has changed. Any time after that it'll work however so I think theres some sort of lag coming from tweepy.
My question is is there a way around this? If not for tweepy what about with another library or language like Java script?
There is a rate max limit for all APIs calls to Twitter, with Tweepy you can let the library deal with it setting wait_on_rate_limit=True.
In this case Tweepy will slow down (hold the requests) to stay within the limits with the advantage that you will not get an error/exception in the application.
In a similar implementation I am calling the api.list_direct_messages once every 60 seconds and I don't hit the rate limit (obviously I cannot respond in real time but this is a decision which depends on the context and business logic).

django celery for long running task

I have list of players like :
player_list = Participant.objects.all()
participant_count = player_list.count()
I want to randomly select winner from this like:
winner_index = random.randint(0, participant_count-1)
winner = player_list[winner_index]
Lets say I have one million participant then I guess It will take long time to randomly generate winner. Till then my site will be hang I guess.
For this purpose should I use celery or its fine? What if my site go hang for few minutes and only display winner. Any suggestion ?
With proper indexing your database should be able to handle this without needing any special workarounds. If you make it asynchronous with celery, then you won't be able to include that data in your standard request/response cycle.
If you're worried about page speed for the user, you could load a page without the winner, then do an ajax call using javascript to get the winner and update the page, allowing you to display a loading message to the user while they wait.

How to use thread in Django

I want to check users' subscribed dates for certain period. And send mail to users whose subscription is finishing (ex. reminds two days).
I think the best way is using thread and timer to check dates. But I have no idea how to call this function. I don't want to make a separate program or shell. I want to combine this procedure to my django code. I tried to call this function in my settings.py file. But it seems it is not a good idea. It calls the function and creates thread every time I imported settings.
That's case for manage.py command called periodically from cron. Oficial doc about creating those commands. Here bit more helpful.
If you want something simpler then django-command-extensions has commands for managing django jobs.
if you need more then only this one asynchronous job have a look at celery.
using Django-cron is much easier and simple
EDIT: Added a tip
from django_cron import cronScheduler, Job
class sendMail(Job):
# period run every 300 seconds (5 minutes)
run_every = 300
def job(self):
# This will be executed every 5 minutes
datatuple = check_subscription_finishing()
send_mass_mail(datatuple)
//and just register it
cronScheduler.register(sendMail)

Categories