Telethon: download channel videos and pictures based on messages' reactions - python

I am trying to figure out a way to scan through all the unread messages in a Telegram Channel and download videos and images that have more than certain amount of reactions.
I got to the point where the script will download all the unread videos and images, but I am stuck at how to filter those messages based on reactions. For example: only download the videos and images that have at least 3👍 reactions and/or 3❤️ reactions.
My code is included below. The script downloads all the unread videos and images from the channel title and store them in a sub-folder called title. Like I said, I don't want to download all the videos and images. I only want the ones that meet certain reaction thresholds. Any suggestions or ideas would be greatly appreciated!
title = 'channel name'
with TelegramClient(username, api_id, api_hash) as client:
chat_names = client(GetDialogsRequest(
offset_date=None,
offset_id=0,
offset_peer='username',
limit=0,
hash=0
))
result = client(functions.messages.GetPeerDialogsRequest(
peers=[title]
))
for chat in chat_names.chats:
if chat.title == title:
for message in client.iter_messages(title, limit=result.dialogs[0].unread_count):
if message.photo or message.video:
message.download_media('./' + str(titles) + '/')

There are easier ways to fetch dialogs than using GetDialogsRequest, which is raw API, with client.get_dialogs instead:
# Fetch the first 100 dialogs (remove the 100 to fetch all of them)
# I have renamed `chat_names` with `dialogs` to make it clearer.
dialogs = client.get_dialogs(100)
The second call to GetPeerDialogsRequest is also unnecessary. It's meant to be used when you want to fetch dialog information about a particular user or chat - but you already fetched all dialog information before.
(Technically, we could remove client.get_dialogs and only use GetPeerDialogsRequest, but for simplicity I won't do that.)
To check reactions given a message you can access the message.reactions field (as seen in the raw API for Message):
print(message.reactions.stringify())
This will give you a feel for what the object looks like with your Telethon version, in my case:
MessageReactions(
results=[
ReactionCount(
reaction=ReactionEmoji(
emoticon='❤'
),
count=11,
chosen_order=None
),
...
]
)
So all that's left is checking if a message meets your needs. I'll make a separate function to make it easier to read:
def should_download(message):
if not message.photo or not message.video:
return False # no photo or video, no need to download
if not message.reactions:
return False # no reactions
for reaction in message.reactions.results:
# It might be a ReactionCustomEmoji which doesn't have an emoticon
# Use getattr to read the emoticon field or return None if it doesn't exist
emoticon = getattr(reaction.reaction, 'emoticon', None)
if emoticon in ('❤', '👍') and reaction.count >= 3:
return True # has enough reactions
return False # did not find the reactions we wanted
And finally the loop can use the function:
for dialog in dialogs:
if dialog.title == title:
# We can use the dialog instead of the title as the chat
for message in client.iter_messages(dialog, dialog.unread_count):
if should_download(message):
message.download_media('./' + str(titles) + '/')

Related

how to specify if the message was from group or private in python telegram bot

Here is my code:
# 1st method
if chat.type == 'supergroup':
# Check if the bot's name was mentioned in the message
# if bot_name in message_text:
# Generate a response to the question
response_text = generate_response(message_text)
update.message.reply_text(response_text)
elif chat.type == 'private':
# Generate a response to the question
response_text = generate_response(message_text)
update.message.reply_text(response_text)
The 'private' is working fine if the message was sent in private chat but if it was send from the group the bot was not able to pick the message. The bot was also in the group.
I tried:
# 2nd method
if chat.id < 0:
# 3rd method
if message.chat.type in ["group", "supergroup"]:
to know if the message coming from the group but no luck.
Only the private one is working.
By default, bots don't see all messages that are written in group chats. Please have a look at this FAQ entry in the official Telegram docs for more info.

Telethon (or Pyrogram) forward whole album instead of last media without caption

I'm trying to forward any last message from a specififc channel using Telethon's client.iter_messages(channel_id, limit=1) and message.forward_to(group_id)
The problem is if the message is an album, only the last image of the album gets forwarded. Also caption is not included.
The exact same thing happens with Pyrogram's app.get_chat_history(channel_id, limit=1) and message.forward(group_id)
The problem itself, I suppose, lies in the way ids work for media in albums.
I need a way to somehow forward the whole message.
Note: I am aware of Telethon's events.Album but don't see any way to implement it in my case. Also this (Forward message (album) telethon) somehow relates to my problem but again I don't know how to make it work properly. Also if message.grouped_id: might help.
Code (Telethon variant):
from telethon import TelegramClient
client = TelegramClient('telethon-client', api_id, api_hash)
async def main():
async for message in client.iter_messages(test_channel_id, limit=1):
await message.forward_to(test_group_id)
with client:
client.loop.run_until_complete(main())
The problem is if the message is an album, only the last image of the album gets forwarded. Also caption is not included.
This is because albums are in fact separate messages. "The last message" is the last photo of the album, and clients often put the caption in the first image of the album.
The problem itself, I suppose, lies in the way ids work for media in albums. I need a way to somehow forward the whole message.
You are correctly forwarding the "whole" message. The problem is you want to forward multiple messages at the same time.
Also if message.grouped_id: might help
Yes, you can use that to detect what messages belong to the same group. An album consists of up to ten messages that share the grouped_id.
You can do something like this:
current_album = None
current_group_id = None
async for message in client.iter_messages(test_channel_id):
if current_group_id and message.grouped_id != current_group_id:
print('Group finished:', current_group_id, 'had', len(current_album))
current_group_id = None
current_album = None
if not current_group_id and message.grouped_id:
print('New group:', message.grouped_id)
current_group_id = message.grouped_id
current_album = []
if current_group_id:
current_album.append(message)
This will tell you how many messages are found in a group. Note that it is possible for messages that have a grouped_id to be interleaved with messages that don't (so the above code can fail):
msg_id 1001, grouped_id 4001
msg_id 1002, grouped_id None
msg_id 1003, grouped_id 4001
is possible, and the code will detect the 4001 album twice, each time with one message. You can adapt the code to attempt to handle this. It is not possible to determine an album's length beforehand, you just need to fetch more messages to find out if they're in the same group.

Is there a way to access every website's data through discord.py

Hi stalkers
Is there a way to access a site's data directly?
I need it for my code :
#commands.command(aliases = ['isitsafe','issafe','scanlink'])
async def isthissafe(self, ctx, link: str):
try:
link = 'https://transparencyreport.google.com/safe-browsing/search?url='+ link.replace('/','%2F')
embed=discord.Embed(
color = discord.Color.dark_red(),
title = '',
description = f"[Transparency Report verification]({link})")
await self.emb(embed, ctx.author.name, 'https://cwatch.comodo.com/images-new/check-my-site-security.png')
await ctx.send(embed=embed)
except:
await ctx.send('An error has occured')
print('\nERROR')
Basically I made, a command which should tell if a link is safe or not, I did it using google's verification report site, but.. the problem is I only reformatted the link so the bot sens it in an embed and you access it from there.
My question is, now that you understood what I need, is there some way in which I could directly let the bot output the message from the website that indicates if the site is malicious/safe ??
Please help me.
I provided an image as well with the message I want to get from the site.
You might want to try scraping the site with bs4, or just look for the string "No unsafe content found". However, it looks like google populates the field based on a request.
Your best bet would be to use transparencyreport.google.com/transparencyreport/api/v3/safebrowsing/status?site=SITE_HERE. It returns a JSON response, but I don't understand it, so play around and figure out what the keys mean

is there a way for my discord bot to access the tenor website and then just add all gifs under a certain tag to it's output?

Title says it all. I have this discord bot that basically uploads cat gifs whenever a certain keyword or command is used. with my current code, I have to manually add the tenor/gif link to the output set so it can display that gif. Instead, I want the bot to just post any gifs of cats from tenor or any other gif website. I'm pretty sure those websites have a tag feature that assigns for example the tag "cat" to a cat gif. I want to know which gif is tagged cat and just add that gif to it's output set. Is there a way I can do this?
import discord
import os
import random
client = discord.Client()
cat_pictures = ["cats", "cat"]
cat_encouragements = [
"https://tenor.com/view/dimden-cat-cute-cat-cute-potato-gif-20953746", "https://tenor.com/view/dimden-cute-cat-cute-cat-potato-gif-20953747", "https://tenor.com/view/cute-cat-cute-cat-dimden-gif-19689251", "https://tenor.com/view/dimden-cute-cat-cute-cat-potato-gif-21657791",
"https://tenor.com/view/cats-kiss-gif-10385036",
"https://tenor.com/view/cute-kitty-best-kitty-alex-cute-pp-kitty-omg-yay-cute-kitty-munchkin-kitten-gif-15917800",
"https://tenor.com/view/cute-cat-oh-yeah-awesome-cats-amazing-gif-15805236",
"https://tenor.com/view/cat-broken-cat-cat-drinking-cat-licking-cat-air-gif-20661740",
"https://tenor.com/view/funny-animals-cute-chicken-cat-fight-dinner-time-gif-8953000"]
#client.event
async def on_ready():
print('We have logged in as catbot '.format(client))
#client.event
async def on_message(message):
if message.author == client.user:
return
if message.content.startswith ('!help'):
await message.channel.send('''
I only have two commands right now which are !cat which posts an image of a cat. !cats which gives you a video/gif
''')
if any(word in message.content for word in cat_pictures):
await message.channel.send(random.choice(cat_encouragements))
if any(word in message.content for word in cat_apology):
await message.channel.send(random.choice(cat_sad))
if any(word in message.content for word in cat_dog):
await message.channel.send(random.choice(cat_dogs))
client.run(os.getenv('TOKEN'))
If you want to get data from other page then you have to learn how to "scrape".
For some pages you may need to use requests (or urllib) to get HTML from server and beautifulsoup (or lxml) to search data in HTML. Often pages uses JavaScript to add elements so it may need Selenium to control real web browser which can run JavaScript (because requests, urllib, beautifulsoup, lxml can't run JavaScript)
But first you should check if page has API for developers to get data in simpler way - as JSON data - so you don't have to search in HTML.
As #ChrisDoyle noticed there is documentation for tensor API.
This documentation shows even example in Python (using requests) which gets JSON data. Example may need only to show how to get urls from JSON because there are other informations - like image sizes, gifs, small gif, animated gifs, mp4, etc.
This is my version based on example from documentation
import requests
# set the apikey and limit
API_KEY = "LIVDSRZULELA" # test value
search_term = "cat"
def get_urls(search, limit=8):
payload = {
'key': API_KEY,
'limit': limit,
'q': search,
}
# our test search
# get the top 8 GIFs for the search term
r = requests.get("https://g.tenor.com/v1/search", params=payload)
results = []
if r.status_code == 200:
data = r.json()
#print('[DEBUG] data:', data)
for item in data['results']:
#print('[DEBUG] item:', item)
for media in item['media']:
#print('[DEBUG] media:', media)
#for key, value in media.items():
# print(f'{key:10}:', value['url'])
#print('----')
if 'tinygif' in media:
results.append(media['tinygif']['url'])
else:
results = []
return results
# --- main ---
cat_encouragements = get_urls('cat')
for url in cat_encouragements:
print(url)
Which gives urls directly to tiny gif images
https://media.tenor.com/images/eff22afc2220e9df92a7aa2f53948f9f/tenor.gif
https://media.tenor.com/images/e0f28542d811073f2b3d223e8ed119f3/tenor.gif
https://media.tenor.com/images/75b3c8eca95d917c650cd574b91db7f7/tenor.gif
https://media.tenor.com/images/80aa0a25bee9defa1d1d7ecaab75f3f4/tenor.gif
https://media.tenor.com/images/042ef64f591bdbdf06edf17e841be4d9/tenor.gif
https://media.tenor.com/images/1e9df4c22da92f1197b997758c1b3ec3/tenor.gif
https://media.tenor.com/images/6562518088b121eab2d19917b65ee793/tenor.gif
https://media.tenor.com/images/eafc0f0bef6d6fd135908eaba24393ac/tenor.gif
If you uncomment some print() in code then you may see more information.
For example links from media.items() for single image
nanowebm : https://media.tenor.com/videos/513b211140bedc05d5ab3d8bc3456c29/webm
tinywebm : https://media.tenor.com/videos/7c1777a988eedb267a6b7d7ed6aaa858/webm
mp4 : https://media.tenor.com/videos/146935e698960bf723a1cd8031f6312f/mp4
loopedmp4 : https://media.tenor.com/videos/e8be91958367e8dc4e6a079298973362/mp4
nanomp4 : https://media.tenor.com/videos/4d46f8b4e95a536d2e25044a0a288968/mp4
tinymp4 : https://media.tenor.com/videos/390f512fd1900b47a7d2cc516dd3283b/mp4
tinygif : https://media.tenor.com/images/eff22afc2220e9df92a7aa2f53948f9f/tenor.gif
mediumgif : https://media.tenor.com/images/c90bf112a9292c442df9310ba5e140fd/tenor.gif
nanogif : https://media.tenor.com/images/6f6eb54b99e34a8128574bd860d70b2f/tenor.gif
gif : https://media.tenor.com/images/8ab88b79885ab587f84cbdfbc3b87835/tenor.gif
webm : https://media.tenor.com/videos/926d53c9889d7604da6745cd5989dc3c/webm
In code I use API_KEY = "LIVDSRZULELA" from documentation but you should register on page to get your unique API_KEY.
Usually API_KEYs from documentations may have restrictions or generate always the same data - they are created only for tests, not for use in real application.
Documentation show more methods to get and filter image - ie to get trending images.

Invite Discord.py

I am trying to read the members of a guild by invitation. As soon as we send a link from a server, information such as name, online membership and total members appears. How do I get this information? (The library in question is discord.py)
Example:
(I can't share an image, so I ask you to open the photo link:)
https://cdn.discordapp.com/attachments/842566116978327584/844421408019841034/unknown.png
Name: ☕|Clube do café #240|☕
Photo url: https://cdn.discordapp.com/icons/828004701148676137/62f745ee62f7fb6dd7fbc34a6b75f2df.png?size=128
Online: 112
Members: 232
Id: 806673124819992688
(I extracted this information manually, but was wondering how to do this in the code)
I have already tried to use the following message attributes: attachments, embeds, guild, stickers and system_content. Both returned empty or the invite link.
This is possible with the help of the fetch_invite() function
invite = await client.fetch_invite(url = "https://discord.gg/Invite-ID")
You can then retrieve your desired attributes
For example:
memberCount = invite.approximate_member_count
presenceCount = invite.approximate_presence_count
guildName = invite.guild.name
guildIcon = invite.guild.icon_url
guildID = invite.guild.id
And so on

Categories