I'm trying to get some data from all running IE instances to hook the logged username from the page source, i tried with selenium to make the users run IE from selenium and then when the user logs in i get the Username and it kinda works in the first tab only, if the user go to the login page from another window/tab, i can't catch that. It looks like selenium with IE window_handles dont work very well.
Searching on google i might have found something better for me that could grab some data from stock internet explorer in all running tabs/windows.
Following this link i was able to get the Page Title and the Location URL and it also explain how to make the page navigate to another URL.
Now what I cant find is how to get the page source.
from win32com.client import Dispatch
from win32gui import GetClassName
ShellWindowsCLSID = '{9BA05972-F6A8-11CF-A442-00A0C90A8F39}'
ShellWindows = Dispatch ( ShellWindowsCLSID )
for sw in ShellWindows :
if GetClassName ( sw . HWND ) == 'IEFrame' :
print(sw)
print(sw.LocationName)
print(sw.LocationURL)
#sw.Document.Location = "http://python.org" navigating to another url
print(50 * '-')
I found a solution from this link so basically to get the page Source
from win32com.client import Dispatch
from win32gui import GetClassName
ShellWindowsCLSID = '{9BA05972-F6A8-11CF-A442-00A0C90A8F39}'
ShellWindows = Dispatch ( ShellWindowsCLSID )
for sw in ShellWindows :
if GetClassName ( sw . HWND ) == 'IEFrame' :
#print the source
print(sw.Document.body.outerHTML)
#get all elements
elements = sw.Document.all
#get all inputs
inputs = [x for x in elements if x.tagName == "INPUT"] # tag name is always uppercase
#get all buttons
buttons = [x for x in elements if x.tagName == "BUTTON"]
#get by class name
rows = [x for x in elements if "row" in x.className.split(' ')]
#get by id
username = [x for x in elements if x.getAttribute("id") == "username"]
Analyzing the file IEC.py you can see that you can set input value or click button, etc.
Related
I have python 3.8, using input type = "file" it duplicates the images and it doesn't work correctly. Is there any way to drag the images to the area?
My code:
from selenium import webdriver
import time
url="https://www.milanuncios.com/publicar-anuncios-gratis/media?idanuncio=365084685&contra=3W45"
browser = webdriver.Firefox()
browser.get(url)
time.sleep(4)
userID = browser.find_element_by_xpath(".//*[#type='file']")
userID.send_keys('foto_1.jpg')
time.sleep(2)
userID.send_keys('foto_2.jpg')
After sending each picture, set the value of Input to be "".
There are two ways you could try:
clear(), Clears the text if it’s a text entry element.
userID.send_keys('foto_1.jpg')
userID.clear()
userID.send_keys('foto_2.jpg')
set the value using pure js
userID.send_keys('foto_1.jpg')
driver.execute_script("arguments[0].value = '';", userID)
userID.send_keys('foto_2.jpg')
Input field supports multiple files upload (node contains multiple boolean attribute), so you can try to upload both files at the same time:
userID = browser.find_element_by_xpath(".//*[#type='file']")
userID.send_keys('foto_1.jpg' + '\n' + 'foto_2.jpg')
or
userID = browser.find_element_by_xpath(".//*[#type='file']")
files = ['foto_1.jpg', 'foto_2.jpg']
userID.send_keys('\n'.join(files))
I am trying to scrape webpages using python and selenium. I have a url which takes a single parameter and a list of valid parameters. I navigate to that url with a single parameter at a time and click on a link, a pop up window opens with a page.
The pop window automatically opens a print dialogue on page load.
Also the url bar is disabled for that popup.
My code:
def packAmazonOrders(self, order_ids):
order_window_handle = self.driver.current_window_handle
for each in order_ids:
self.driver.find_element_by_id('sc-search-field').send_keys(Keys.CONTROL, "a")
self.driver.find_element_by_id('sc-search-field').send_keys(Keys.DELETE)
self.driver.find_element_by_id('sc-search-field').send_keys(each)
self.driver.find_element_by_class_name('sc-search-button').click()
src = self.driver.page_source.encode('utf-8')
if 'Unshipped' in src and 'Easy Ship - Schedule pickup' in src:
is_valid = True
else:
is_valid = False
if is_valid:
print 'Packing Slip Start - %s' %each
self.driver.find_element_by_link_text('Print order packing slip').click()
handles = self.driver.window_handles
print handles
try:
handles.remove(order_window_handle)
except:
pass
self.driver.switch_to_window(handles.pop())
print handles
packing_slip_page = ''
packing_slip_page = self.driver.page_source.encode('utf-8')
if each in packing_slip_page:
print 'Packing Slip Window'
else:
print 'not found'
self.driver.close()
self.driver.switch_to_window(order_window_handle)
Now I have two questions:
How can I download that pop up page as pdf?
For first parameter every thing works fine. But for another parameters in the list the packing_slip_page does not update (which i think because of the disabled url bar. But not sure though.) I tried the print the handle (print handles) for each parametre but it always print the same value. So how to access the correct page source for other parameters?
I made a simple python script that posts a random youtube video and a quote to Facebook group(s).
The problem is, that it doesn't give Facebook the time to load the random video. To be more specific, at the moment the post looks like this:
But I want it to look like this:
My current code looks like this (I omitted sensitive data):
""" Song of the day script """
import facebook
import os
from pyquery import PyQuery
import requests
import random
class Sofy(object):
GROUPS = ["123", "123"]
FB_ACCESS_TOKEN = "123accesstoken"
PLAYLISTS = ["123youtubeplaylist"]
VIDEOS = []
def get_video(self):
req = requests.get("https://www.youtube.com/playlist?list={}".format(self.PLAYLISTS[0]))
pq = PyQuery(req.text)
for video in pq(".pl-video").items():
self.VIDEOS.append(video.attr("data-video-id"))
return "https://www.youtube.com/watch?v={}".format(random.choice(self.VIDEOS[-5:]))
def get_qoute(self):
pwd = os.path.dirname(os.path.realpath(__file__))
fx = pwd + '/quotes.txt'
lines = open(fx).read().splitlines()
return random.choice(lines)
def run(self):
quote = self.get_qoute()
video = self.get_video()
graph = facebook.GraphAPI(access_token=self.FB_ACCESS_TOKEN, version='2.2')
for group in self.GROUPS:
graph.put_object(group, "feed", message="{}\n Song of the day: {}".format(quote, video))
print "All done :)"
if __name__=='__main__':
sofy = Sofy()
sofy.run()
I tried doing this with Selenium but it didn't quote work as expected. Also, this way looks cleaner, but I can't figure out how to let youtube video load, I'm not even sure if it's possible?
It doesn't look like you're actually sharing the link correctly, looks like you're adding the URL into the 'message' parameter -
It should be attached correctly if you specify it in the 'link' parameter
Im new to python and figured that best way to learn is by practice, this is my first project.
So there is this fantasy football website. My goal is to create script which logins to site, automatically creates preselected team and submits it.
I have managed to get to submitting team part.
When I add a team member this data gets sent to server:
https://i.gyazo.com/e7e6f82ca91e19a08d1522b93a55719b.png
When I press save this list this data gets sent:
https://i.gyazo.com/546d49d1f132eabc5e6f659acf7c929e.png
Code:
import requests
with requests.Session() as c:
gameurl = 'here is link where data is sent'
BPL = ['5388', '5596', '5481', '5587',
'5585', '5514', '5099', '5249', '5566', '5501', '5357']
GID = '168'
UDID = '0'
ACT = 'draft'
ACT2 = 'save_draft'
SIGN = '18852c5f48a94bf3ee58057ff5c016af'
# eleven of those with different BPL since 11 players needed:
c.get(gameurl)
game_data = dict(player_id = BPL[0], action = ACT, id = GID)
c.post(gameurl, data = game_data)
# now I need to submit my list of selected players:
game_data_save = dict( action = ACT2, id = GID, user_draft_id = UDID, sign = SIGN)
c.post(gameurl, data = game_data_save)
This code works pretty fine, but the problem is, that 'SIGN' is unique for each individual game and I have no idea how to get this data without using Chromes inspect option.
How can I get this data simply running python code?
Because you said you can find it using devtools I'm assuming SIGN is written somewhere in the DOM.
In that case you can use requests.get().text to get the HTML of the page and parse it with a tool like lxml or HTMLParser
Solved by posting all data without 'SIGN' and in return I got 'SIGN' in html.
I want to get all video url's of a specific channel. I think json with python or java would be a good choice. I can get the newest video with the following code, but how can I get ALL video links (>500)?
import urllib, json
author = 'Youtube_Username'
inp = urllib.urlopen(r'http://gdata.youtube.com/feeds/api/videos?max-results=1&alt=json&orderby=published&author=' + author)
resp = json.load(inp)
inp.close()
first = resp['feed']['entry'][0]
print first['title'] # video title
print first['link'][0]['href'] #url
After the youtube API change, max k.'s answer does not work. As a replacement, the function below provides a list of the youtube videos in a given channel. Please note that you need an API Key for it to work.
import urllib
import json
def get_all_video_in_channel(channel_id):
api_key = YOUR API KEY
base_video_url = 'https://www.youtube.com/watch?v='
base_search_url = 'https://www.googleapis.com/youtube/v3/search?'
first_url = base_search_url+'key={}&channelId={}&part=snippet,id&order=date&maxResults=25'.format(api_key, channel_id)
video_links = []
url = first_url
while True:
inp = urllib.urlopen(url)
resp = json.load(inp)
for i in resp['items']:
if i['id']['kind'] == "youtube#video":
video_links.append(base_video_url + i['id']['videoId'])
try:
next_page_token = resp['nextPageToken']
url = first_url + '&pageToken={}'.format(next_page_token)
except:
break
return video_links
Short answer:
Here's a library That can help with that.
pip install scrapetube
import scrapetube
videos = scrapetube.get_channel("UC9-y-6csu5WGm29I7JiwpnA")
for video in videos:
print(video['videoId'])
Long answer:
The module mentioned above was created by me due to a lack of any other solutions. Here's what i tried:
Selenium. It worked but had three big drawbacks: 1. It requires a web browser and driver to be installed. 2. has big CPU and memory requirements. 3. can't handle big channels.
Using youtube-dl. Like this:
import youtube_dl
youtube_dl_options = {
'skip_download': True,
'ignoreerrors': True
}
with youtube_dl.YoutubeDL(youtube_dl_options) as ydl:
videos = ydl.extract_info(f'https://www.youtube.com/channel/{channel_id}/videos')
This also works for small channels, but for bigger ones i would get blocked by youtube for making so many requests in such a short time (because youtube-dl downloads more info for every video in the channel).
So i made the library scrapetube which uses the web API to get all the videos.
Increase max-results from 1 to however many you want, but beware they don't advise grabbing too many in one call and will limit you at 50 (https://developers.google.com/youtube/2.0/developers_guide_protocol_api_query_parameters).
Instead you could consider grabbing the data down in batches of 25, say, by changing the start-index until none came back.
EDIT: Here's the code for how I would do it
import urllib, json
author = 'Youtube_Username'
foundAll = False
ind = 1
videos = []
while not foundAll:
inp = urllib.urlopen(r'http://gdata.youtube.com/feeds/api/videos?start-index={0}&max-results=50&alt=json&orderby=published&author={1}'.format( ind, author ) )
try:
resp = json.load(inp)
inp.close()
returnedVideos = resp['feed']['entry']
for video in returnedVideos:
videos.append( video )
ind += 50
print len( videos )
if ( len( returnedVideos ) < 50 ):
foundAll = True
except:
#catch the case where the number of videos in the channel is a multiple of 50
print "error"
foundAll = True
for video in videos:
print video['title'] # video title
print video['link'][0]['href'] #url
Based on the code found here and at some other places, I've written a small script that does this. My script uses v3 of Youtube's API and does not hit against the 500 results limit that Google has set for searches.
The code is available over at GitHub: https://github.com/dsebastien/youtubeChannelVideosFinder
Independent way of doing things. No api, no rate limit.
import requests
username = "marquesbrownlee"
url = "https://www.youtube.com/user/username/videos"
page = requests.get(url).content
data = str(page).split(' ')
item = 'href="/watch?'
vids = [line.replace('href="', 'youtube.com') for line in data if item in line] # list of all videos listed twice
print(vids[0]) # index the latest video
This above code will scrap only limited number of video url's max upto 60. How to grab all the videos url which is present in the channel. Can you please suggest.
This above code snippet will display only the list of all the videos which is listed twice. Not all the video url's in the channel.
Using Selenium Chrome Driver:
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import time
driverPath = ChromeDriverManager().install()
driver = webdriver.Chrome(driverPath)
url = 'https://www.youtube.com/howitshouldhaveended/videos'
driver.get(url)
height = driver.execute_script("return document.documentElement.scrollHeight")
previousHeight = -1
while previousHeight < height:
previousHeight = height
driver.execute_script(f'window.scrollTo(0,{height + 10000})')
time.sleep(1)
height = driver.execute_script("return document.documentElement.scrollHeight")
vidElements = driver.find_elements_by_id('thumbnail')
vid_urls = []
for v in vidElements:
vid_urls.append(v.get_attribute('href'))
This code has worked the few times I've tried it; however, you might need to tweak the sleep time, or add a way to recognize when the browser is still loading the extra information. It easily worked for me for getting a channel with 300+ videos, but it was having an issue with one that had 7000+ videos due to the time required to load the new videos on the browser becoming inconsistent.
I modified the script originally posted by dermasmid to fit my needs. This is the result:
import scrapetube
import sys
path = '_list.txt'
sys.stdout = open(path, 'w')
videos = scrapetube.get_channel("UC9-y-6csu5WGm29I7JiwpnA")
for video in videos:
print("https://www.youtube.com/watch?v="+str(video['videoId']))
# print(video['videoId'])
Basically it is saves all the URLs from the playlist into a "_list.txt" file. I am using this "_list.txt" file to download all the videos using the yt-dlp.exe. All the downloaded files have the .mp4 extension.
Now I do need to create another "_playlist.txt" file that contains all the FILENAMES coresponding to each URL from the "_List.txt".
For example, for: "https://www.youtube.com/watch?v=yG1m7oGZC48" to have "Apple M1 Ultra & NUMA - Computerphile.mp4" as output into the "_playlist.txt"
I do made some further improvements, to be able to add the channel URL into the console, print the result on screen and also into an external file called "_list.txt".
import scrapetube
import sys
path = '_list.txt'
print('**********************\n')
print("The result will be saved in '_list.txt' file.")
print("Enter Channel ID:")
# Prints the output in the console and into the '_list.txt' file.
class Logger:
def __init__(self, filename):
self.console = sys.stdout
self.file = open(filename, 'w')
def write(self, message):
self.console.write(message)
self.file.write(message)
def flush(self):
self.console.flush()
self.file.flush()
sys.stdout = Logger(path)
# Strip the: "https://www.youtube.com/channel/"
channel_id_input = input()
channel_id = channel_id_input.strip("https://www.youtube.com/channel/")
videos = scrapetube.get_channel(channel_id)
for video in videos:
print("https://www.youtube.com/watch?v="+str(video['videoId']))
# print(video['videoId'])