Pytube how to add a progress bar? - python

I don't know how to do it and whenever i try solutions from other topics about this question i get errors.
Mostly "TypeError: show_progress_bar() missing 1 required positional argument: 'bytes_remaining'".
from pytube import YouTube
#took this def from another topic
def progress_function(stream, chunk, file_handle, bytes_remaining):
percent = round((1-bytes_remaining/video.filesize)*100)
if( percent%10 == 0):
print(percent, 'done...')
url = "Any youtube url"
yt = YouTube(url, on_progress_callback=progress_function)
yt.streams[0].download()
for example when i run this exact code it gives me that error.
I really can't comprehend its logic. I also searched the docs from pytube3 website but i can't solve this. Pls help me. Thanks.

Remove the stream then it will work, recently I tried to develop similar logic facing a similar error.
Here is the code that worked for me:
def progress(chunk, file_handle, bytes_remaining):
global filesize
remaining = (100 * bytes_remaining) / filesize
step = 100 - int(remaining)
print("Completed:", step) # show the percentage of completed download
The filesize can be retrived, once you select which video or audio to download such as
yt = YouTube(str(link), on_progress_callback=progress) # Declare YouTube
yt1 = yt.streams.get_by_itag(int(itag)) # itag is given when you list all the streams of a youtube video
filesize = yt1.filesize
Hope this helps!

Related

Python Code not opening VlC player of twitch stream instances

Hello so I don't stream right but wanted to make a video on peoples reactions when they are suddenly hit with a lot of people (this would be accompanied by a chat bot too and ill tell them what it was as well as ask for use permissions). So I thought it would be fun to look at view bots for twitch and found one online (code below). so I ran in installed streamlink via Pip and windows executable and it seems to run "found matching plugin twitch for URL "Stream link"" but it doesn't actually increase viewership and I can only assume this is because its not actually opening the Vlc instances, so here I am wondering what I need to do I have the latest version of python and git isnt trying to download and install anything so im assuming streamlink is all I need but im kind confused why it woudnt be opening the VLC instance any help is most appreciated.
Edit: oh and I do have the proxies and using a small amount to try and get it to work first, and will buy more later but after I get this to work!
import concurrent.futures, time, random, os
#desired channel url
channel_url = 'https://www.twitch.tv/StreamerName'
#number of viewer bots
botcount = 10
#path to proxies.txt file
proxypath = "C:\Proxy\proxy.txt"
#path to vlc
playerpath = r'"C:\Program Files\VideoLAN\VLC\vlc.exe"'
#takes proxies from proxies.txt and returns to list
def create_proxy_list(proxyfile, shared_list):
with open(proxyfile, 'r') as file:
proxies = [line.strip() for line in file]
for i in proxies:
shared_list.append((i))
return shared_list
#takes random proxies from the proxies list and adds them to another list
def randproxy(proxylist, botcount):
randomproxylist = list()
for _ in range(botcount):
proxy = random.choice(proxylist)
randomproxylist.append(proxy)
proxylist.remove(proxy)
return (randomproxylist)
#launches a viewer bot after a short delay
def launchbots(proxy):
time.sleep(random.randint(5, 10))
os.system(f'streamlink --player={playerpath} --player-no-close --player-http --hls-segment-timeout 30 --hls-segment-attempts 3 --retry-open 1 --retry-streams 1 --retry-max 1 --http-stream-timeout 3600 --http-proxy {proxy} {channel_url} worst')
#calls the launchbots function asynchronously
def main(randomproxylist):
with concurrent.futures.ThreadPoolExecutor() as executer:
executer.map(launchbots, randomproxylist)
if __name__ == "__main__":
main(randproxy(create_proxy_list(proxypath, shared_list=list()), botcount))

Python Pytube progress for playlist download

I am writing a program in python using pytube, and I want to indicate progress when downloading a playlist. When downloading a single video I can do:
YouTube(url, on_progress_callback=progressFunction)
but that doesn't work when downloading a playlist:
Playlist(url, on_progress_callback=progressFunction)
I get the following error:
TypeError: __init__() got an unexpected keyword argument 'on_progress_callback'
Is there any way to get the progress when downloading a playlist?
Hey you can get all the urls from the Playlist and then call download one by one.
this works for me all the best.
def getAllLinks(playList):
'''
: This function take a link of playlist and return the link of each videos
:param playList:
:return: A list of all Url links
'''
allLinks = []
youtubeLink = 'https://www.youtube.com'
pl = Playlist(playList)
for linkprefix in pl.parse_links():
allLinks.append(youtubeLink + linkprefix)
return allLinks
from this you will get all the urls and then
def downloadPlaylist(playlistLink):
linkArray = getAllLinks(playlistLink)
for link in linkArray:
downloadVideo(link)
According to the source code, the Playlist class doesn't need on_progress_callback keyword argument, but only the url one.
You can use the register_on_progress_callback function to register a download progress callback function post initialization.
An example of this would be:
p = Playlist('https://www.youtube.com/playlist?list=PLetg744TF10BrdPjaEXf4EsJ1wz6fyf95')
for v in p.videos:
v.register_on_progress_callback(progressFunction)
# proceed to downloading...
from pytube import Playlist
from pytube.cli import on_progress
yt_playlist = Playlist(url)
for video in yt_playlist.videos:
video.register_on_progress_callback(on_progress)
video.streams.first().download()

Stream audio from pyaudio with Flask to HTML5

I want to stream the audio of my microphone (that is being recorded via pyaudio) via Flask to any client that connects.
This is where the audio comes from:
def getSound(self):
# Current chunk of audio data
data = self.stream.read(self.CHUNK)
self.frames.append(data)
wave = self.save(list(self.frames))
return data
Here's my flask-code:
#app.route('/audiofeed')
def audiofeed():
def gen(microphone):
while True:
sound = microphone.getSound()
#with open('tmp.wav', 'rb') as myfile:
# yield myfile.read()
yield sound
return Response(stream_with_context(gen(Microphone())))
And this is the client:
<audio controls>
<source src="{{ url_for('audiofeed') }}" type="audio/x-wav;codec=pcm">
Your browser does not support the audio element.
</audio>
It does work sometimes, but most of the times I'm getting "[Errno 32] Broken pipe"
When uncommenting that with open("tmp.wav")-part (the self.save() optionally takes all previous frames and saves them in tmp.wav), I kind of get a stream, but all that comes out of the speakers is a "clicking"-noise.
I'm open for any suggestions. How do I get the input of my microphone live-streamed (no pre-recording!) to a webbrowser?
Thanks!
Try This its worked for me. shell cmd "cat" is working perfect see the code
iam using FLASK
import subprocess
import os
import inspect
from flask import Flask
from flask import Response
#app.route('/playaudio')
def playaudio():
sendFileName=""
def generate():
# get_list_all_files_name this function gives all internal files inside the folder
filesAudios=get_list_all_files_name(currentDir+"/streamingAudios/1")
# audioPath is audio file path in system
for audioPath in filesAudios:
data=subprocess.check_output(['cat',audioPath])
yield data
return Response(generate(), mimetype='audio/mp3')
This question was asked long time ago, but since I spent entire day to figure out how to implement the same, I want to give the answer. Maybe it will be helpful for somebody.
"[Errno 32] Broken pipe" error comes from the fact that client can not play audio and closes this stream.
Audio can not be played due to absence of the header in the data stream. You can easily create the header using genHeader(sampleRate, bitsPerSample, channels, samples) function from the code here . This header has to be attached at least to the first chunck of sent data ( chunck=header+data ). Pay attention, that audio can be played ONLY untill client reaches file size in download that you have to specify in the header. So, workaround would be to set in the header some big files size, e.g. 2Gb.
Instead of datasize = len(samples) * channels * bitsPerSample in the header function write datasize = 2000*10**6.
def gen_audio():
CHUNK = 512
sampleRate = 44100
bitsPerSample = 16
channels = 2
wav_header = genHeader(sampleRate, bitsPerSample, channels)
audio = AudioRead()
data = audio.get_audio_chunck()
chunck = wav_header + data
while True:
yield (chunck)
data = audio.get_audio_chunck()
chunck = data
After lots research and tinkering I finally found the solution.
Basically it came down to serving pyaudio.paFloat32 audio data through WebSockets using Flask's SocketIO implementation and receiving/playing the data in JavaScript using HTML5's AudioContext.
As this is requires quite some code, I think it would not be a good idea to post it all here. Instead, feel free to check out the project I'm using it in: simpleCam
The relevant code is in:
- noise_detector.py (recording)
- server.py (WebSocket transfer)
- static/js/player.js (receiving/playing)
Thanks everyone for the support!

Play mp4 video with python and gstreamer

I'm trying to play video in mp4 format but not working.
In console I execute this line and it works:
gst-launch playbin uri=rtmp://localhost:1935/files/video.mp4
But if I change to version 1.0 only works the audio:
gst-launch-1.0 playbin uri=rtmp://localhost:1935/files/video.mp4
in python I have the following code:
self.player = Gst.Pipeline.new("player")
source = Gst.ElementFactory.make("filesrc", "file-source")
demuxer = Gst.ElementFactory.make("mp4mux", "demuxer")
demuxer.connect("pad-added", self.demuxer_callback)
self.video_decoder = Gst.ElementFactory.make("x264enc", "video-decoder")
self.audio_decoder = Gst.ElementFactory.make("vorbisdec", "audio-decoder")
audioconv = Gst.ElementFactory.make("audioconvert", "converter")
audiosink = Gst.ElementFactory.make("autoaudiosink", "audio-output")
videosink = Gst.ElementFactory.make("autovideosink", "video-output")
self.queuea = Gst.ElementFactory.make("queue", "queuea")
self.queuev = Gst.ElementFactory.make("queue", "queuev")
colorspace = Gst.ElementFactory.make("videoconvert", "colorspace")
self.player.add(source)
self.player.add(demuxer)
self.player.add(self.video_decoder)
self.player.add(self.audio_decoder)
self.player.add(audioconv)
self.player.add(audiosink)
self.player.add(videosink)
self.player.add(self.queuea)
self.player.add(self.queuev)
self.player.add(colorspace)
source.link(demuxer)
self.queuev.link(self.video_decoder)
self.video_decoder.link(colorspace)
colorspace.link(videosink)
self.queuea.link(self.audio_decoder)
self.audio_decoder.link(audioconv)
audioconv.link(audiosink)
but I get this error:
Error: Error in the internal data flow. gstbasesrc.c(2865): gst_base_src_loop (): /GstPipeline:player/GstFileSrc:file-source:
streaming task paused, reason not-linked (-1)
What can be happening? think I am no good decoding
You are missing linking the demuxer pads to your queues. Demuxers have 'sometimes' pads so you need to listen to the pad-added signal of them and link in this callback. Remember to check the pad caps once you get them and link to the appropriate branch of your pipeline.
You can read about dynamic pads here: http://gstreamer.freedesktop.org/data/doc/gstreamer/head/manual/html/chapter-pads.html#section-pads-dynamic
You have in your code:
demuxer = Gst.ElementFactory.make("mp4mux", "demuxer")
demuxer.connect("pad-added", self.demuxer_callback)
I hope this is a cut/paste error, as demuxing with a mux will not work. I believe for an .mp4 file, the normal demuxer (if you are choosing one by hand) is qtdemux.
You could also use decodebin to decode the file for you.

python: get all youtube video urls of a channel

I want to get all video url's of a specific channel. I think json with python or java would be a good choice. I can get the newest video with the following code, but how can I get ALL video links (>500)?
import urllib, json
author = 'Youtube_Username'
inp = urllib.urlopen(r'http://gdata.youtube.com/feeds/api/videos?max-results=1&alt=json&orderby=published&author=' + author)
resp = json.load(inp)
inp.close()
first = resp['feed']['entry'][0]
print first['title'] # video title
print first['link'][0]['href'] #url
After the youtube API change, max k.'s answer does not work. As a replacement, the function below provides a list of the youtube videos in a given channel. Please note that you need an API Key for it to work.
import urllib
import json
def get_all_video_in_channel(channel_id):
api_key = YOUR API KEY
base_video_url = 'https://www.youtube.com/watch?v='
base_search_url = 'https://www.googleapis.com/youtube/v3/search?'
first_url = base_search_url+'key={}&channelId={}&part=snippet,id&order=date&maxResults=25'.format(api_key, channel_id)
video_links = []
url = first_url
while True:
inp = urllib.urlopen(url)
resp = json.load(inp)
for i in resp['items']:
if i['id']['kind'] == "youtube#video":
video_links.append(base_video_url + i['id']['videoId'])
try:
next_page_token = resp['nextPageToken']
url = first_url + '&pageToken={}'.format(next_page_token)
except:
break
return video_links
Short answer:
Here's a library That can help with that.
pip install scrapetube
import scrapetube
videos = scrapetube.get_channel("UC9-y-6csu5WGm29I7JiwpnA")
for video in videos:
print(video['videoId'])
Long answer:
The module mentioned above was created by me due to a lack of any other solutions. Here's what i tried:
Selenium. It worked but had three big drawbacks: 1. It requires a web browser and driver to be installed. 2. has big CPU and memory requirements. 3. can't handle big channels.
Using youtube-dl. Like this:
import youtube_dl
youtube_dl_options = {
'skip_download': True,
'ignoreerrors': True
}
with youtube_dl.YoutubeDL(youtube_dl_options) as ydl:
videos = ydl.extract_info(f'https://www.youtube.com/channel/{channel_id}/videos')
This also works for small channels, but for bigger ones i would get blocked by youtube for making so many requests in such a short time (because youtube-dl downloads more info for every video in the channel).
So i made the library scrapetube which uses the web API to get all the videos.
Increase max-results from 1 to however many you want, but beware they don't advise grabbing too many in one call and will limit you at 50 (https://developers.google.com/youtube/2.0/developers_guide_protocol_api_query_parameters).
Instead you could consider grabbing the data down in batches of 25, say, by changing the start-index until none came back.
EDIT: Here's the code for how I would do it
import urllib, json
author = 'Youtube_Username'
foundAll = False
ind = 1
videos = []
while not foundAll:
inp = urllib.urlopen(r'http://gdata.youtube.com/feeds/api/videos?start-index={0}&max-results=50&alt=json&orderby=published&author={1}'.format( ind, author ) )
try:
resp = json.load(inp)
inp.close()
returnedVideos = resp['feed']['entry']
for video in returnedVideos:
videos.append( video )
ind += 50
print len( videos )
if ( len( returnedVideos ) < 50 ):
foundAll = True
except:
#catch the case where the number of videos in the channel is a multiple of 50
print "error"
foundAll = True
for video in videos:
print video['title'] # video title
print video['link'][0]['href'] #url
Based on the code found here and at some other places, I've written a small script that does this. My script uses v3 of Youtube's API and does not hit against the 500 results limit that Google has set for searches.
The code is available over at GitHub: https://github.com/dsebastien/youtubeChannelVideosFinder
Independent way of doing things. No api, no rate limit.
import requests
username = "marquesbrownlee"
url = "https://www.youtube.com/user/username/videos"
page = requests.get(url).content
data = str(page).split(' ')
item = 'href="/watch?'
vids = [line.replace('href="', 'youtube.com') for line in data if item in line] # list of all videos listed twice
print(vids[0]) # index the latest video
This above code will scrap only limited number of video url's max upto 60. How to grab all the videos url which is present in the channel. Can you please suggest.
This above code snippet will display only the list of all the videos which is listed twice. Not all the video url's in the channel.
Using Selenium Chrome Driver:
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import time
driverPath = ChromeDriverManager().install()
driver = webdriver.Chrome(driverPath)
url = 'https://www.youtube.com/howitshouldhaveended/videos'
driver.get(url)
height = driver.execute_script("return document.documentElement.scrollHeight")
previousHeight = -1
while previousHeight < height:
previousHeight = height
driver.execute_script(f'window.scrollTo(0,{height + 10000})')
time.sleep(1)
height = driver.execute_script("return document.documentElement.scrollHeight")
vidElements = driver.find_elements_by_id('thumbnail')
vid_urls = []
for v in vidElements:
vid_urls.append(v.get_attribute('href'))
This code has worked the few times I've tried it; however, you might need to tweak the sleep time, or add a way to recognize when the browser is still loading the extra information. It easily worked for me for getting a channel with 300+ videos, but it was having an issue with one that had 7000+ videos due to the time required to load the new videos on the browser becoming inconsistent.
I modified the script originally posted by dermasmid to fit my needs. This is the result:
import scrapetube
import sys
path = '_list.txt'
sys.stdout = open(path, 'w')
videos = scrapetube.get_channel("UC9-y-6csu5WGm29I7JiwpnA")
for video in videos:
print("https://www.youtube.com/watch?v="+str(video['videoId']))
# print(video['videoId'])
Basically it is saves all the URLs from the playlist into a "_list.txt" file. I am using this "_list.txt" file to download all the videos using the yt-dlp.exe. All the downloaded files have the .mp4 extension.
Now I do need to create another "_playlist.txt" file that contains all the FILENAMES coresponding to each URL from the "_List.txt".
For example, for: "https://www.youtube.com/watch?v=yG1m7oGZC48" to have "Apple M1 Ultra & NUMA - Computerphile.mp4" as output into the "_playlist.txt"
I do made some further improvements, to be able to add the channel URL into the console, print the result on screen and also into an external file called "_list.txt".
import scrapetube
import sys
path = '_list.txt'
print('**********************\n')
print("The result will be saved in '_list.txt' file.")
print("Enter Channel ID:")
# Prints the output in the console and into the '_list.txt' file.
class Logger:
def __init__(self, filename):
self.console = sys.stdout
self.file = open(filename, 'w')
def write(self, message):
self.console.write(message)
self.file.write(message)
def flush(self):
self.console.flush()
self.file.flush()
sys.stdout = Logger(path)
# Strip the: "https://www.youtube.com/channel/"
channel_id_input = input()
channel_id = channel_id_input.strip("https://www.youtube.com/channel/")
videos = scrapetube.get_channel(channel_id)
for video in videos:
print("https://www.youtube.com/watch?v="+str(video['videoId']))
# print(video['videoId'])

Categories