Python Code not opening VlC player of twitch stream instances - python

Hello so I don't stream right but wanted to make a video on peoples reactions when they are suddenly hit with a lot of people (this would be accompanied by a chat bot too and ill tell them what it was as well as ask for use permissions). So I thought it would be fun to look at view bots for twitch and found one online (code below). so I ran in installed streamlink via Pip and windows executable and it seems to run "found matching plugin twitch for URL "Stream link"" but it doesn't actually increase viewership and I can only assume this is because its not actually opening the Vlc instances, so here I am wondering what I need to do I have the latest version of python and git isnt trying to download and install anything so im assuming streamlink is all I need but im kind confused why it woudnt be opening the VLC instance any help is most appreciated.
Edit: oh and I do have the proxies and using a small amount to try and get it to work first, and will buy more later but after I get this to work!
import concurrent.futures, time, random, os
#desired channel url
channel_url = 'https://www.twitch.tv/StreamerName'
#number of viewer bots
botcount = 10
#path to proxies.txt file
proxypath = "C:\Proxy\proxy.txt"
#path to vlc
playerpath = r'"C:\Program Files\VideoLAN\VLC\vlc.exe"'
#takes proxies from proxies.txt and returns to list
def create_proxy_list(proxyfile, shared_list):
with open(proxyfile, 'r') as file:
proxies = [line.strip() for line in file]
for i in proxies:
shared_list.append((i))
return shared_list
#takes random proxies from the proxies list and adds them to another list
def randproxy(proxylist, botcount):
randomproxylist = list()
for _ in range(botcount):
proxy = random.choice(proxylist)
randomproxylist.append(proxy)
proxylist.remove(proxy)
return (randomproxylist)
#launches a viewer bot after a short delay
def launchbots(proxy):
time.sleep(random.randint(5, 10))
os.system(f'streamlink --player={playerpath} --player-no-close --player-http --hls-segment-timeout 30 --hls-segment-attempts 3 --retry-open 1 --retry-streams 1 --retry-max 1 --http-stream-timeout 3600 --http-proxy {proxy} {channel_url} worst')
#calls the launchbots function asynchronously
def main(randomproxylist):
with concurrent.futures.ThreadPoolExecutor() as executer:
executer.map(launchbots, randomproxylist)
if __name__ == "__main__":
main(randproxy(create_proxy_list(proxypath, shared_list=list()), botcount))

Related

Python - Downloading PDF and saving to disk using Selenium

I'm creating an application that downloads PDF's from a website and saves them to disk. I understand the Requests module is capable of this but is not capable of handling the logic behind the download (File size, progress, time remaining etc.).
I've created the program using selenium thus far and would like to eventually incorporate this into a GUI Tkinter app eventually.
What would be the best way to handle the downloading, tracking and eventually creating a progress bar?
This is my code so far:
from selenium import webdriver
from time import sleep
import requests
import secrets
class manual_grabber():
""" A class creating a manual downloader for the Roger Technology website """
def __init__(self):
""" Initialize attributes of manual grabber """
self.driver = webdriver.Chrome('\\Users\\Joel\\Desktop\\Python\\manual_grabber\\chromedriver.exe')
def login(self):
""" Function controlling the login logic """
self.driver.get('https://rogertechnology.it/en/b2b')
sleep(1)
# Locate elements and enter login details
user_in = self.driver.find_element_by_xpath('/html/body/div[2]/form/input[6]')
user_in.send_keys(secrets.username)
pass_in = self.driver.find_element_by_xpath('/html/body/div[2]/form/input[7]')
pass_in.send_keys(secrets.password)
enter_button = self.driver.find_element_by_xpath('/html/body/div[2]/form/div/input')
enter_button.click()
# Click Self Service Area button
self_service_button = self.driver.find_element_by_xpath('//*[#id="bs-example-navbar-collapse-1"]/ul/li[1]/a')
self_service_button.click()
def download_file(self):
"""Access file tree and navigate to PDF's and download"""
# Wait for all elements to load
sleep(3)
# Find and switch to iFrame
frame = self.driver.find_element_by_xpath('//*[#id="siteOutFrame"]/iframe')
self.driver.switch_to.frame(frame)
# Find and click tech manuals button
tech_manuals_button = self.driver.find_element_by_xpath('//*[#id="fileTree_1"]/ul/li/ul/li[6]/a')
tech_manuals_button.click()
bot = manual_grabber()
bot.login()
bot.download_file()
So in summary, I'd like to make this code download PDF's on a website, store them in a specific directory (named after it's parent folder in the JQuery File Tree) and keep tracking of the progress (file size, time remaining etc.)
Here is the DOM:
I hope this is enough information. Any more required please let me know.
I would recommend using tqdm and the request module for this.
Here is a sample code that effectively achieves that hard job of downloading and updating progress bar.
from tqdm import tqdm
import requests
url = "http://www.ovh.net/files/10Mb.dat" #big file test
# Streaming, so we can iterate over the response.
response = requests.get(url, stream=True)
total_size_in_bytes= int(response.headers.get('content-length', 0))
block_size = 1024 #1 Kibibyte
progress_bar = tqdm(total=total_size_in_bytes, unit='iB', unit_scale=True)
with open('test.dat', 'wb') as file:
for data in response.iter_content(block_size):
progress_bar.update(len(data)) #change this to your widget in tkinter
file.write(data)
progress_bar.close()
if total_size_in_bytes != 0 and progress_bar.n != total_size_in_bytes:
print("ERROR, something went wrong")
The block_size is your file-size and the time-remaining can be calculated with the number of iterations performed per second with respect to the block-size that remains. Here is an alternative - How to measure download speed and progress using requests?

Discord bot - issue saving a text file after hosting

OK, I have been trying to think of a solution/find a solution myself for quite some time but everything I am attempting either ends up not a solution, or too complex for me to attempt without knowing it will work.
I have a discord bot, made in python. The bots purpose is to parse a blog for HTML links, when a new HTML link is posted, it will the post the link into discord.
I am using a textfile to save the latest link, and then parsing the website every 30seconds to check if a new link has been posted by comparing the link at position 0 in the array to the link in the textfile.
Now, I have managed to host my bot on Heroku with some success however I have since learned that Heroku cannot modify my textfile since it pulls the code from github, any changes are reverted after ~24hours.
Since learning this I have attempted to host the textfile on an AWS S3 bucket, however I have now learned that it can add and delete files, but not modify existing ones, and can only write new files from existing files on my system, meaning if I could do this, I wouldn't need to do this since I would be able to modify the file actually on my system and not need to host it anywhere.
I am looking for hopefully simple solutions/suggestions.
I am open to changing the hosting/whatever is needed, however I cannot pay for hosting.
Thanks in advance.
EDIT
So, I am editing this because I have a working solution thanks to a suggestion commented below.
The solution is to get my python bot to commit the new file to github, and then use that commited file's content as the reference.
import base64
import os
from github import Github
from github import InputGitTreeElement
user = os.environ.get("GITHUB_USER")
password = os.environ.get("GITHUB_PASSWORD")
g = Github(user,password)
repo = g.get_user().get_repo('YOUR REPO NAME HERE')
file_list = [
'last_update.txt'
]
file_names = [
'last_update.txt',
]
def git_commit():
commit_message = 'News link update'
master_ref = repo.get_git_ref('heads/master')
master_sha = master_ref.object.sha
base_tree = repo.get_git_tree(master_sha)
element_list = list()
for i, entry in enumerate(file_list):
with open(entry) as input_file:
data = input_file.read()
if entry.endswith('.png'):
data = base64.b64encode(data)
element = InputGitTreeElement(file_names[i], '100644', 'blob', data)
element_list.append(element)
tree = repo.create_git_tree(element_list, base_tree)
parent = repo.get_git_commit(master_sha)
commit = repo.create_git_commit(commit_message, tree, [parent])
master_ref.edit(commit.sha)
I then have a method called 'check_latest_link' which checks my github repo's RAW format, and parses that HTML to source the contents and then assigns that content as a string to my variable 'last_saved_link'
import requests
def check_latest_link():
res = requests.get('[YOUR GITHUB PAGE LINK - RAW FORMAT]')
content = res.text
return(content)
Then in my main method I have the follow :
#client.event
async def task():
await client.wait_until_ready()
print('Running')
while True:
channel = discord.Object(id=channel_id)
#parse_links() is a method to parse HTML links from a website
news_links = parse_links()
last_saved_link = check_latest_link()
print('Running')
await asyncio.sleep(5)
#below compares the parsed HTML, to the saved reference,
#if they are not the same then there is a new link to post.
if last_saved_link != news_links[0]:
#the 3 methods below (read_file, delete_contents and save_to_file)
#are methods that simply do what they suggest to a text file specified elsewhere
read_file()
delete_contents()
save_to_file(news_links[0])
#then we have the git_commit previously shown.
git_commit()
#after git_commit, I was having an issue with the github reference
#not updating for a few minutes, so the bot posts the message and
#then goes to sleep for 500 seconds, this stops the bot from
#posting duplicate messages. because this is an async function,
#it will not stop other async functions from executing.
await client.send_message(channel, news_links[0])
await asyncio.sleep(500)
I am posting this so I can close the thread with an "Answer" - please refer to post edit.

Incomplete HAR list using Python: Browsermobproxy, selenium, phantomJS

Fairly new to python, I learn by doing, so I thought I'd give this project a shot. Trying to create a script which finds the google analytics request for a certain website parses the request payload and does something with it.
Here are the requirements:
Ask user for 2 urls ( for comparing the payloads from 2 diff. HAR payloads)
Use selenium to open the two urls, use browsermobproxy/phantomJS to
get all HAR
Store the HAR as a list
From the list of all HAR files, find the google analytics request, including the payload
If Google Analytics tag found, then do things....like parse the payload, etc. compare the payload, etc.
Issue: Sometimes for a website that I know has google analytics, i.e. nytimes.com - the HAR that I get is incomplete, i.e. my prog. will say "GA Not found" but that's only because the complete HAR was not captured so when the regex ran to find the matching HAR it wasn't there. This issue in intermittent and does not happen all the time. Any ideas?
I'm thinking that due to some dependency or latency, the script moved on and that the complete HAR didn't get captured. I tried the "wait for traffic to stop" but maybe I didn't do something right.
Also, as a bonus, I would appreciate any help you can provide on how to make this script run fast, its fairly slow. As I mentioned, I'm new to python so go easy :)
This is what I've got thus far.
import browsermobproxy as mob
from selenium import webdriver
import re
import sys
import urlparse
import time
from datetime import datetime
def cleanup():
s.stop()
driver.quit()
proxy_path = '/Users/bob/Downloads/browsermob-proxy-2.1.4-bin/browsermob-proxy-2.1.4/bin/browsermob-proxy'
s = mob.Server(proxy_path)
s.start()
proxy = s.create_proxy()
proxy_address = "--proxy=127.0.0.1:%s" % proxy.port
service_args = [proxy_address, '--ignore-ssl-errors=yes', '--ssl-protocol=any'] # so that i can do https connections
driver = webdriver.PhantomJS(executable_path='/Users/bob/Downloads/phantomjs-2.1.1-windows/phantomjs-2.1.1-windows/bin/phantomjs', service_args=service_args)
driver.set_window_size(1400, 1050)
urlLists = []
collectTags = []
gaCollect = 0
varList = []
for x in range(0,2): # I want to ask the user for 2 inputs
url = raw_input("Enter a website to find GA on: ")
time.sleep(2.0)
urlLists.append(url)
if not url:
print "You need to type something in...here"
sys.exit()
#gets the two user url and stores in list
for urlList in urlLists:
print urlList, 'start 2nd loop' #printing for debug purpose, no need for this
if not urlList:
print 'Your Url list is empty'
sys.exit()
proxy.new_har()
driver.get(urlList)
#proxy.wait_for_traffic_to_stop(15, 30) #<-- tried this but did not do anything
for ent in proxy.har['log']['entries']:
gaCollect = (ent['request']['url'])
print gaCollect
if re.search(r'google-analytics.com/r\b', gaCollect):
print 'Found GA'
collectTags.append(gaCollect)
time.sleep(2.0)
break
else:
print 'No GA Found - Ending Prog.'
cleanup()
sys.exit()
cleanup()
This might be a stale question, but I found an answer that worked for me.
You need to change two things:
1 - Remove sys.exit() -- this causes your programme to stop after the first iteration through the ent list, so if what you want is not the first thing, it won't be found
2 - call new_har with the captureContent option enabled to get the payload of requests:
proxy.new_har(options={'captureHeaders':True, 'captureContent': True})
See if that helps.

Python bindings for libVLC - cannot change audio output device

VLC 2.2.3, python-vlc 1.1.2.
I have a virtual audio output and am trying to get libVLC to output on it. So far, I can see the virtual output appearing in libVLC, but selecting it makes audio play on the default output (i.e. the speakers).
This is the relevant part of what I have:
self.Instance = vlc.Instance()
self.player = self.Instance.media_player_new()
devices = []
mods = self.player.audio_output_device_enum()
if mods:
mod = mods
while mod:
mod = mod.contents
devices.append(mod.device)
mod = mod.next
vlc.libvlc_audio_output_device_list_release(mods)
# this is the part I change on each run of the code.
self.player.audio_output_device_set(None, devices[0])
I've run the code multiple times, changing the device ID as per the code comment. However, the output device doesn't actually change. This is a headache for two reasons:
1) audio_output_device_set() doesn't return anything. I can't tell if I'm actually accomplishing anything when I run this function.
2) I can't even run audio_output_device_get() to check if the set function is doing anything as this is only for libvlc 3. I would prefer for my program to work with 2.2.3.
So, what I did next was install VLC 3.0 and run the above code with it. Now, audio_output_device_get() works and I can see that the set function is actually changing the output device to the virtual output. But sound STILL plays on the speakers.
What's going on? How do I fix this?
I asked at the VLC forums and got a singularly unhelpful reply telling me to 'check logs and documentation'. That's it. I've been superglued to the rather lacking documentation to get this far. Even though I doubt it can help, I've decided to try logging. I thought it would be as simple as calling libvlc_log_set_file but it needs a libVLC file pointer, and I don't know how to create one with a name and mode as in Python.
tl;dr:
1) How do I successfully change the audio output device?
2) How do I set up maximum verbosity logging?
1) For some reason, I had to pause and unpause before VLC would register my change.
This code fixes things:
[... rest of GUI class ...]
self.p.play()
self.root.after(350, self.DeviceSet)
def DeviceSet(self):
self.p.audio_output_device_set(None, self.audiodevice)
self.p.pause()
self.root.after(10)
self.p.pause()
2) Initialise VLC as follows:
self.instance = vlc.Instance('--verbose 9')
Here is a full example of how to switch to different audio device.
Remember: don't call player.stop() after player.audio_output_device_set(), otherwise the set operation won't work!!
import time
from typing import List
import vlc
def vlc_set_device_test(filename: str):
# creating a vlc instance
vlc_instance: vlc.Instance = vlc.Instance()
player: vlc.MediaPlayer = vlc_instance.media_player_new()
media: vlc.Media = vlc_instance.media_new(filename)
player.set_media(media)
# list devices
device_ids: List[bytes] = []
mods = player.audio_output_device_enum()
if mods:
index = 0
mod = mods
while mod:
mod = mod.contents
desc = mod.description.decode('utf-8', 'ignore')
print(f'index = {index}, desc = {desc}')
device_ids.append(mod.device)
mod = mod.next
index += 1
# free devices
vlc.libvlc_audio_output_device_list_release(mods)
# hard code device
pc_speaker = device_ids[1]
headset = device_ids[3]
# play music to default device
player.play()
time.sleep(3)
# set output device
player.audio_output_device_set(None, headset)
# don't call player.stop()!!
player.pause()
# now music is playing from headset
player.play()
time.sleep(10)
player.stop()
if __name__ == '__main__':
vlc_set_device_test(r'D:\cheer up.mp3')

Downloading Streams Simulatenously with Python 3.5

EDIT: I think I've figured out a solution using subprocess.Popen with separate .py files for each stream being monitored. It's not pretty, but it works.
I'm working on a script to monitor a streaming site for several different accounts and to record when they are online. I am using the livestreamer package for downloading a stream when it comes online, but the problem is that the program will only record one stream at a time. I have the program loop through a list and if a stream is online, start recording with subprocess.call(["livestreamer"... The problem is that once the program starts recording, it stops going through the loop and doesn't check or record any of the other livestreams. I've tried using Process and Thread, but none of these seem to work. Any ideas?
Code below. Asterisks are not literally part of code.
import os,urllib.request,time,subprocess,datetime,random
status = {
"********":False,
"********":False,
"********":False
}
def gen_name(tag):
return stuff <<Bunch of unimportant code stuff here.
def dl(tag):
subprocess.call(["livestreamer","********.com/"+tag,"best","-o",".\\tmp\\"+gen_name(tag)])
def loopCheck():
while True:
for tag in status:
data = urllib.request.urlopen("http://*******.com/" + tag + "/").read().decode()
if data.find(".m3u8") != -1:
print(tag + " is online!")
if status[tag] == False:
status[tag] = True
dl(tag)
else:
print(tag+ " is offline.")
status[tag] = False
time.sleep(15)
loopCheck()

Categories