Python Pytube progress for playlist download - python

I am writing a program in python using pytube, and I want to indicate progress when downloading a playlist. When downloading a single video I can do:
YouTube(url, on_progress_callback=progressFunction)
but that doesn't work when downloading a playlist:
Playlist(url, on_progress_callback=progressFunction)
I get the following error:
TypeError: __init__() got an unexpected keyword argument 'on_progress_callback'
Is there any way to get the progress when downloading a playlist?

Hey you can get all the urls from the Playlist and then call download one by one.
this works for me all the best.
def getAllLinks(playList):
'''
: This function take a link of playlist and return the link of each videos
:param playList:
:return: A list of all Url links
'''
allLinks = []
youtubeLink = 'https://www.youtube.com'
pl = Playlist(playList)
for linkprefix in pl.parse_links():
allLinks.append(youtubeLink + linkprefix)
return allLinks
from this you will get all the urls and then
def downloadPlaylist(playlistLink):
linkArray = getAllLinks(playlistLink)
for link in linkArray:
downloadVideo(link)

According to the source code, the Playlist class doesn't need on_progress_callback keyword argument, but only the url one.

You can use the register_on_progress_callback function to register a download progress callback function post initialization.
An example of this would be:
p = Playlist('https://www.youtube.com/playlist?list=PLetg744TF10BrdPjaEXf4EsJ1wz6fyf95')
for v in p.videos:
v.register_on_progress_callback(progressFunction)
# proceed to downloading...

from pytube import Playlist
from pytube.cli import on_progress
yt_playlist = Playlist(url)
for video in yt_playlist.videos:
video.register_on_progress_callback(on_progress)
video.streams.first().download()

Related

when I'm try to download video using pytube but its not woking

Unable to download video using pytube
import customtkinter
from pytube import YouTube
def startDownload():
try:
ytLink = link.get()
YouTube(ytLink).streams.get_highest_resolution().download()
except:
print("YouTube link is invalid")
print("Download Complete!")
#Link input
url_var = tkinter.StringVar()
link = customtkinter.CTkEntry(app, width=400, height=40, textvariable=url_var)
link.pack()
#Download button
download = customtkinter.CTkButton(app, text="Download", command=startDownload)
download.pack(padx=10, pady=10)
error in Line number 6. input take but download function not work
output - YouTube link is invalid
Download Complete!
from pytube import Youtube
That's because you are downloading the first one of the available streams which is usually 720p. To download a 360p resolution stream, you can do:
YouTube('https://youtu.be/2lAe1cqCOXo').streams.filter(res="360p").first().download()
Note: this is YouTube, not Youtube.
Short explanation: You need to use filter() to choose a specific resolution you want to download. For example, if you call:
yt = YouTube('https://youtu.be/2lAe1cqCOXo')
it returns the available stream to yt. You can view all the streams by typing:
yt.streams
You can filter which type of filter you want. To filter only 360p streams you can write:
yt.streams.filter(res="360p")
To filter only 360p streams and download the first one type this:
yt.streams.filter(res="360p").first().download()

How to make python download an image from a URL but if the image is already downloaded it doesnt

So say from a random api, lets say api.example.com as an example. It sends a random image once you go on it and sends the json for it. So like {"url": "api.example.com/img1.png"}. After de-jsonifying it how can i download the image and save it in some folder, but if its already downloaded so say the image name is taken it will not download it.
Edit: here is my code i done so far.
`
url = f"https://nekos.life/api/v2/img/neko"
response = requests.get(url)
response.raise_for_status()
jsonResponse = response.`json()
urll = (jsonResponse["url"])
urllib.request.urlretrieve(urll, "neko.png")`
as said in this article, i think [os.path][1] can do the job pretty well.
just try to use
os.path.exists(phot_path)
that should be it.
[1]: https://linuxize.com/post/python-check-if-file-exists/

How to download youtube playlist video using pytube?

I am using Python 3.8.5.
#pip freeze
beautifulsoup4==4.9.1
bs4==0.0.1
packaging==20.4
pyparsing==2.4.7
PyQt5==5.15.0
PyQt5-sip==12.8.0
PyQtWebEngine==5.15.0
pytube3==9.6.4
sip==5.3.0
six==1.15.0
soupsieve==2.0.1
toml==0.10.1
typing-extensions==3.7.4.2
CODE
from pytube import Playlist
playlist = Playlist('https://www.youtube.com/playlist?list=PL6gx4Cwl9DGCkg2uj3PxUWhMDuTw3VKjM')
print('Number of videos in playlist: %s' % len(playlist.video_urls))
for video_url in playlist.video_urls:
print(video_url)
playlist.download_all()
WARNING
Number of videos in playlist: 0 playlist.py:24: DeprecationWarning:
Call to deprecated function download_all (This function will be
removed in the future. Please iterate through .videos).
playlist.download_all() /media/hophoet/Nouveau
nom/Projects/python/workspace/automates/autovideo/venv/lib/python3.8/site-packages/pytube/contrib/playlist.py:216:
DeprecationWarning: Call to deprecated function
_path_num_prefix_generator (This function will be removed in the future.).
Based on some research it seems that there is problem between pytube3==9.6.4 and YouTube's HTML code. The issue is related to how pytube3 reads a YouTube URL. The code below solves this issue by using a regex, which matches YouTube's updated HTML code.
import re
from pytube import Playlist
playlist = Playlist("https://www.youtube.com/playlist?list=PL6gx4Cwl9DGCkg2uj3PxUWhMDuTw3VKjM")
playlist._video_regex = re.compile(r"\"url\":\"(/watch\?v=[\w-]*)")
print('Number of videos in playlist: %s' % len(playlist.video_urls))
for url in playlist.video_urls:
print(url)
###############################
OUTPUT
###############################
Number of videos in playlist: 23
https://www.youtube.com/watch?v=HjuHHI60s44
https://www.youtube.com/watch?v=Z40N7b9NHTE
https://www.youtube.com/watch?v=FvziRqkLrEU
https://www.youtube.com/watch?v=XN2-87haa8k
https://www.youtube.com/watch?v=VgI4UKyL0Lc
https://www.youtube.com/watch?v=BvPIgm2SMG8
https://www.youtube.com/watch?v=DpdmUmglPBA
https://www.youtube.com/watch?v=BmVmJi5dR9c
https://www.youtube.com/watch?v=pYNuKXjcriM
https://www.youtube.com/watch?v=EWONqLqSxYc
https://www.youtube.com/watch?v=EKmLXiA4zaQ
https://www.youtube.com/watch?v=-DHCm9AlXvo
https://www.youtube.com/watch?v=7cRaGaIZQlo
https://www.youtube.com/watch?v=ZkcEB96iMFk
https://www.youtube.com/watch?v=5Fcf-8LPvws
https://www.youtube.com/watch?v=xWLgdSgsBFo
https://www.youtube.com/watch?v=QcKYFEgfV-I
https://www.youtube.com/watch?v=BtSQIxDPnLc
https://www.youtube.com/watch?v=O5kh_-6e4kk
https://www.youtube.com/watch?v=RuWVDz-48-o
https://www.youtube.com/watch?v=-yjc5Y7Wbmw
https://www.youtube.com/watch?v=C5T59WsrNCU
https://www.youtube.com/watch?v=MWldNGdX9zE
I did note that pytube3 is currently not being supported and that someone forked it to pytubeX. I'm trying to figure out the download, because I cannot get that piece to work with pytube3 or pytubex. I will keep looking at this issue.

Posting to FB group with requests, allow youtube video to load

I made a simple python script that posts a random youtube video and a quote to Facebook group(s).
The problem is, that it doesn't give Facebook the time to load the random video. To be more specific, at the moment the post looks like this:
But I want it to look like this:
My current code looks like this (I omitted sensitive data):
""" Song of the day script """
import facebook
import os
from pyquery import PyQuery
import requests
import random
class Sofy(object):
GROUPS = ["123", "123"]
FB_ACCESS_TOKEN = "123accesstoken"
PLAYLISTS = ["123youtubeplaylist"]
VIDEOS = []
def get_video(self):
req = requests.get("https://www.youtube.com/playlist?list={}".format(self.PLAYLISTS[0]))
pq = PyQuery(req.text)
for video in pq(".pl-video").items():
self.VIDEOS.append(video.attr("data-video-id"))
return "https://www.youtube.com/watch?v={}".format(random.choice(self.VIDEOS[-5:]))
def get_qoute(self):
pwd = os.path.dirname(os.path.realpath(__file__))
fx = pwd + '/quotes.txt'
lines = open(fx).read().splitlines()
return random.choice(lines)
def run(self):
quote = self.get_qoute()
video = self.get_video()
graph = facebook.GraphAPI(access_token=self.FB_ACCESS_TOKEN, version='2.2')
for group in self.GROUPS:
graph.put_object(group, "feed", message="{}\n Song of the day: {}".format(quote, video))
print "All done :)"
if __name__=='__main__':
sofy = Sofy()
sofy.run()
I tried doing this with Selenium but it didn't quote work as expected. Also, this way looks cleaner, but I can't figure out how to let youtube video load, I'm not even sure if it's possible?
It doesn't look like you're actually sharing the link correctly, looks like you're adding the URL into the 'message' parameter -
It should be attached correctly if you specify it in the 'link' parameter

python: get all youtube video urls of a channel

I want to get all video url's of a specific channel. I think json with python or java would be a good choice. I can get the newest video with the following code, but how can I get ALL video links (>500)?
import urllib, json
author = 'Youtube_Username'
inp = urllib.urlopen(r'http://gdata.youtube.com/feeds/api/videos?max-results=1&alt=json&orderby=published&author=' + author)
resp = json.load(inp)
inp.close()
first = resp['feed']['entry'][0]
print first['title'] # video title
print first['link'][0]['href'] #url
After the youtube API change, max k.'s answer does not work. As a replacement, the function below provides a list of the youtube videos in a given channel. Please note that you need an API Key for it to work.
import urllib
import json
def get_all_video_in_channel(channel_id):
api_key = YOUR API KEY
base_video_url = 'https://www.youtube.com/watch?v='
base_search_url = 'https://www.googleapis.com/youtube/v3/search?'
first_url = base_search_url+'key={}&channelId={}&part=snippet,id&order=date&maxResults=25'.format(api_key, channel_id)
video_links = []
url = first_url
while True:
inp = urllib.urlopen(url)
resp = json.load(inp)
for i in resp['items']:
if i['id']['kind'] == "youtube#video":
video_links.append(base_video_url + i['id']['videoId'])
try:
next_page_token = resp['nextPageToken']
url = first_url + '&pageToken={}'.format(next_page_token)
except:
break
return video_links
Short answer:
Here's a library That can help with that.
pip install scrapetube
import scrapetube
videos = scrapetube.get_channel("UC9-y-6csu5WGm29I7JiwpnA")
for video in videos:
print(video['videoId'])
Long answer:
The module mentioned above was created by me due to a lack of any other solutions. Here's what i tried:
Selenium. It worked but had three big drawbacks: 1. It requires a web browser and driver to be installed. 2. has big CPU and memory requirements. 3. can't handle big channels.
Using youtube-dl. Like this:
import youtube_dl
youtube_dl_options = {
'skip_download': True,
'ignoreerrors': True
}
with youtube_dl.YoutubeDL(youtube_dl_options) as ydl:
videos = ydl.extract_info(f'https://www.youtube.com/channel/{channel_id}/videos')
This also works for small channels, but for bigger ones i would get blocked by youtube for making so many requests in such a short time (because youtube-dl downloads more info for every video in the channel).
So i made the library scrapetube which uses the web API to get all the videos.
Increase max-results from 1 to however many you want, but beware they don't advise grabbing too many in one call and will limit you at 50 (https://developers.google.com/youtube/2.0/developers_guide_protocol_api_query_parameters).
Instead you could consider grabbing the data down in batches of 25, say, by changing the start-index until none came back.
EDIT: Here's the code for how I would do it
import urllib, json
author = 'Youtube_Username'
foundAll = False
ind = 1
videos = []
while not foundAll:
inp = urllib.urlopen(r'http://gdata.youtube.com/feeds/api/videos?start-index={0}&max-results=50&alt=json&orderby=published&author={1}'.format( ind, author ) )
try:
resp = json.load(inp)
inp.close()
returnedVideos = resp['feed']['entry']
for video in returnedVideos:
videos.append( video )
ind += 50
print len( videos )
if ( len( returnedVideos ) < 50 ):
foundAll = True
except:
#catch the case where the number of videos in the channel is a multiple of 50
print "error"
foundAll = True
for video in videos:
print video['title'] # video title
print video['link'][0]['href'] #url
Based on the code found here and at some other places, I've written a small script that does this. My script uses v3 of Youtube's API and does not hit against the 500 results limit that Google has set for searches.
The code is available over at GitHub: https://github.com/dsebastien/youtubeChannelVideosFinder
Independent way of doing things. No api, no rate limit.
import requests
username = "marquesbrownlee"
url = "https://www.youtube.com/user/username/videos"
page = requests.get(url).content
data = str(page).split(' ')
item = 'href="/watch?'
vids = [line.replace('href="', 'youtube.com') for line in data if item in line] # list of all videos listed twice
print(vids[0]) # index the latest video
This above code will scrap only limited number of video url's max upto 60. How to grab all the videos url which is present in the channel. Can you please suggest.
This above code snippet will display only the list of all the videos which is listed twice. Not all the video url's in the channel.
Using Selenium Chrome Driver:
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import time
driverPath = ChromeDriverManager().install()
driver = webdriver.Chrome(driverPath)
url = 'https://www.youtube.com/howitshouldhaveended/videos'
driver.get(url)
height = driver.execute_script("return document.documentElement.scrollHeight")
previousHeight = -1
while previousHeight < height:
previousHeight = height
driver.execute_script(f'window.scrollTo(0,{height + 10000})')
time.sleep(1)
height = driver.execute_script("return document.documentElement.scrollHeight")
vidElements = driver.find_elements_by_id('thumbnail')
vid_urls = []
for v in vidElements:
vid_urls.append(v.get_attribute('href'))
This code has worked the few times I've tried it; however, you might need to tweak the sleep time, or add a way to recognize when the browser is still loading the extra information. It easily worked for me for getting a channel with 300+ videos, but it was having an issue with one that had 7000+ videos due to the time required to load the new videos on the browser becoming inconsistent.
I modified the script originally posted by dermasmid to fit my needs. This is the result:
import scrapetube
import sys
path = '_list.txt'
sys.stdout = open(path, 'w')
videos = scrapetube.get_channel("UC9-y-6csu5WGm29I7JiwpnA")
for video in videos:
print("https://www.youtube.com/watch?v="+str(video['videoId']))
# print(video['videoId'])
Basically it is saves all the URLs from the playlist into a "_list.txt" file. I am using this "_list.txt" file to download all the videos using the yt-dlp.exe. All the downloaded files have the .mp4 extension.
Now I do need to create another "_playlist.txt" file that contains all the FILENAMES coresponding to each URL from the "_List.txt".
For example, for: "https://www.youtube.com/watch?v=yG1m7oGZC48" to have "Apple M1 Ultra & NUMA - Computerphile.mp4" as output into the "_playlist.txt"
I do made some further improvements, to be able to add the channel URL into the console, print the result on screen and also into an external file called "_list.txt".
import scrapetube
import sys
path = '_list.txt'
print('**********************\n')
print("The result will be saved in '_list.txt' file.")
print("Enter Channel ID:")
# Prints the output in the console and into the '_list.txt' file.
class Logger:
def __init__(self, filename):
self.console = sys.stdout
self.file = open(filename, 'w')
def write(self, message):
self.console.write(message)
self.file.write(message)
def flush(self):
self.console.flush()
self.file.flush()
sys.stdout = Logger(path)
# Strip the: "https://www.youtube.com/channel/"
channel_id_input = input()
channel_id = channel_id_input.strip("https://www.youtube.com/channel/")
videos = scrapetube.get_channel(channel_id)
for video in videos:
print("https://www.youtube.com/watch?v="+str(video['videoId']))
# print(video['videoId'])

Categories