python: get all youtube video urls of a channel - python

I want to get all video url's of a specific channel. I think json with python or java would be a good choice. I can get the newest video with the following code, but how can I get ALL video links (>500)?
import urllib, json
author = 'Youtube_Username'
inp = urllib.urlopen(r'http://gdata.youtube.com/feeds/api/videos?max-results=1&alt=json&orderby=published&author=' + author)
resp = json.load(inp)
inp.close()
first = resp['feed']['entry'][0]
print first['title'] # video title
print first['link'][0]['href'] #url

After the youtube API change, max k.'s answer does not work. As a replacement, the function below provides a list of the youtube videos in a given channel. Please note that you need an API Key for it to work.
import urllib
import json
def get_all_video_in_channel(channel_id):
api_key = YOUR API KEY
base_video_url = 'https://www.youtube.com/watch?v='
base_search_url = 'https://www.googleapis.com/youtube/v3/search?'
first_url = base_search_url+'key={}&channelId={}&part=snippet,id&order=date&maxResults=25'.format(api_key, channel_id)
video_links = []
url = first_url
while True:
inp = urllib.urlopen(url)
resp = json.load(inp)
for i in resp['items']:
if i['id']['kind'] == "youtube#video":
video_links.append(base_video_url + i['id']['videoId'])
try:
next_page_token = resp['nextPageToken']
url = first_url + '&pageToken={}'.format(next_page_token)
except:
break
return video_links

Short answer:
Here's a library That can help with that.
pip install scrapetube
import scrapetube
videos = scrapetube.get_channel("UC9-y-6csu5WGm29I7JiwpnA")
for video in videos:
print(video['videoId'])
Long answer:
The module mentioned above was created by me due to a lack of any other solutions. Here's what i tried:
Selenium. It worked but had three big drawbacks: 1. It requires a web browser and driver to be installed. 2. has big CPU and memory requirements. 3. can't handle big channels.
Using youtube-dl. Like this:
import youtube_dl
youtube_dl_options = {
'skip_download': True,
'ignoreerrors': True
}
with youtube_dl.YoutubeDL(youtube_dl_options) as ydl:
videos = ydl.extract_info(f'https://www.youtube.com/channel/{channel_id}/videos')
This also works for small channels, but for bigger ones i would get blocked by youtube for making so many requests in such a short time (because youtube-dl downloads more info for every video in the channel).
So i made the library scrapetube which uses the web API to get all the videos.

Increase max-results from 1 to however many you want, but beware they don't advise grabbing too many in one call and will limit you at 50 (https://developers.google.com/youtube/2.0/developers_guide_protocol_api_query_parameters).
Instead you could consider grabbing the data down in batches of 25, say, by changing the start-index until none came back.
EDIT: Here's the code for how I would do it
import urllib, json
author = 'Youtube_Username'
foundAll = False
ind = 1
videos = []
while not foundAll:
inp = urllib.urlopen(r'http://gdata.youtube.com/feeds/api/videos?start-index={0}&max-results=50&alt=json&orderby=published&author={1}'.format( ind, author ) )
try:
resp = json.load(inp)
inp.close()
returnedVideos = resp['feed']['entry']
for video in returnedVideos:
videos.append( video )
ind += 50
print len( videos )
if ( len( returnedVideos ) < 50 ):
foundAll = True
except:
#catch the case where the number of videos in the channel is a multiple of 50
print "error"
foundAll = True
for video in videos:
print video['title'] # video title
print video['link'][0]['href'] #url

Based on the code found here and at some other places, I've written a small script that does this. My script uses v3 of Youtube's API and does not hit against the 500 results limit that Google has set for searches.
The code is available over at GitHub: https://github.com/dsebastien/youtubeChannelVideosFinder

Independent way of doing things. No api, no rate limit.
import requests
username = "marquesbrownlee"
url = "https://www.youtube.com/user/username/videos"
page = requests.get(url).content
data = str(page).split(' ')
item = 'href="/watch?'
vids = [line.replace('href="', 'youtube.com') for line in data if item in line] # list of all videos listed twice
print(vids[0]) # index the latest video
This above code will scrap only limited number of video url's max upto 60. How to grab all the videos url which is present in the channel. Can you please suggest.
This above code snippet will display only the list of all the videos which is listed twice. Not all the video url's in the channel.

Using Selenium Chrome Driver:
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import time
driverPath = ChromeDriverManager().install()
driver = webdriver.Chrome(driverPath)
url = 'https://www.youtube.com/howitshouldhaveended/videos'
driver.get(url)
height = driver.execute_script("return document.documentElement.scrollHeight")
previousHeight = -1
while previousHeight < height:
previousHeight = height
driver.execute_script(f'window.scrollTo(0,{height + 10000})')
time.sleep(1)
height = driver.execute_script("return document.documentElement.scrollHeight")
vidElements = driver.find_elements_by_id('thumbnail')
vid_urls = []
for v in vidElements:
vid_urls.append(v.get_attribute('href'))
This code has worked the few times I've tried it; however, you might need to tweak the sleep time, or add a way to recognize when the browser is still loading the extra information. It easily worked for me for getting a channel with 300+ videos, but it was having an issue with one that had 7000+ videos due to the time required to load the new videos on the browser becoming inconsistent.

I modified the script originally posted by dermasmid to fit my needs. This is the result:
import scrapetube
import sys
path = '_list.txt'
sys.stdout = open(path, 'w')
videos = scrapetube.get_channel("UC9-y-6csu5WGm29I7JiwpnA")
for video in videos:
print("https://www.youtube.com/watch?v="+str(video['videoId']))
# print(video['videoId'])
Basically it is saves all the URLs from the playlist into a "_list.txt" file. I am using this "_list.txt" file to download all the videos using the yt-dlp.exe. All the downloaded files have the .mp4 extension.
Now I do need to create another "_playlist.txt" file that contains all the FILENAMES coresponding to each URL from the "_List.txt".
For example, for: "https://www.youtube.com/watch?v=yG1m7oGZC48" to have "Apple M1 Ultra & NUMA - Computerphile.mp4" as output into the "_playlist.txt"

I do made some further improvements, to be able to add the channel URL into the console, print the result on screen and also into an external file called "_list.txt".
import scrapetube
import sys
path = '_list.txt'
print('**********************\n')
print("The result will be saved in '_list.txt' file.")
print("Enter Channel ID:")
# Prints the output in the console and into the '_list.txt' file.
class Logger:
def __init__(self, filename):
self.console = sys.stdout
self.file = open(filename, 'w')
def write(self, message):
self.console.write(message)
self.file.write(message)
def flush(self):
self.console.flush()
self.file.flush()
sys.stdout = Logger(path)
# Strip the: "https://www.youtube.com/channel/"
channel_id_input = input()
channel_id = channel_id_input.strip("https://www.youtube.com/channel/")
videos = scrapetube.get_channel(channel_id)
for video in videos:
print("https://www.youtube.com/watch?v="+str(video['videoId']))
# print(video['videoId'])

Related

Youtube Data API error: Subtitles are disabled for this video

Im trying to get the Title and transcript of all videos of a playlist:
from googleapiclient.discovery import build
from youtube_transcript_api import YouTubeTranscriptApi
import os
api_key = "*********************************"
#1.query API
rq = build("youtube", "v3", developerKey=api_key).playlistItems().list(
part="contentDetails, snippet",
playlistId="PL-osiE80TeTtoQCKZ03TU5fNfx2UY6U4p",
maxResults=50,
).execute()
#2.Create a list with video Ids and Titles
vid_ids = []
vid_title = []
for item in rq["items"]:
vid_ids.append(item["contentDetails"]["videoId"])
vid_title.append(item["snippet"]["title"])
#3.Get transcripts
srt = YouTubeTranscriptApi.get_transcripts(vid_ids)
print(srt)
But I get an error because one or more of those videos have no subtitles:
Could not retrieve a transcript for the video https://www.youtube.com/watch?v=D2lwk1Ukgz0! This is most likely caused by:
Subtitles are disabled for this video
What would you code in python to avoid this error, and get the transcripts of at least the rest of the videos of the playlist? maybe an If Statement (if the video has no subtitles jump to the next one) or similar?
Thanks in advance.
Try and except should help here:
for id in vid_ids:
try:
srt = YouTubeTranscriptApi.get_transcripts(id)
except:
print(f"{id} doesn't have a transcript")
This will basically ignore exceptions and tell you which id doesn't have a transcript.
Just try one video id at a time to get its transcript using try/except and pay attention not to pass directly a video id but instead an array of one video id to YouTubeTranscriptApi.get_transcripts otherwise it doesn't work.
So change:
#3.Get transcripts
srt = YouTubeTranscriptApi.get_transcripts(vid_ids)
print(srt)
For:
#3.Get transcripts
srt = []
for vid_id in vid_ids:
try:
srt += [YouTubeTranscriptApi.get_transcripts([vid_id])]
except:
srt += [({vid_id => []}, [])]
print(srt)

How to save/display giphy gif using python API?

I am creating one of those cool moving photograph frames, eventually with my own pictures, but for now I just want to search giphy and save/display a gif.
Here's the code I gathered would be useful from their API.
import giphy_client as gc
from giphy_client.rest import ApiException
from random import randint
api_instance = gc.DefaultApi()
api_key = 'MY_API_KEY'
query = 'art'
fmt = 'gif'
try:
response = api_instance.gifs_search_get(api_key,query,limit=1,offset=randint(1,10),fmt=fmt)
gif_id = response.data[0]
except ApiException:
print("Exception when calling DefaultApi->gifs_search_get: %s\n" % e)
with open('test.txt','w') as f:
f.write(type(gif_id))
I get an object of type: class 'giphy_client.models.gif.Gif', I want to save this gif and display it on a monitor. I understand that I am a far way off on this but I am still learning about API and how to use them. If anyone can help me find a way to save this gif or display it directly from their website, that would be much appreciated!
Welcome dbarth!
I see your code does successfully retrieve a random image, that is good.
There are 3 steps needed to get the image:
Get the GIF URL.
That giphy_client client you are using, is made with Swagger, so, you can access the REST Response elements like any other object, or print them.
For example:
>>> print(gif_id.images.downsized.url)
'https://media0.giphy.com/media/l3nWlvtvAFHcDFKXm/giphy-downsized.gif?cid=e1bb72ff5c7dc1c67732476c2e69b2ff'
Note that when I print this, I get an URL. The Gif object you got, called gif_id, has a bunch of URLs to download the GIF or MP4 at different resolutions. In this case, I went with the downsized GIF. You can see all the elements retrieved using print(gif_id)
So, I will add this to your code:
gif_url = gif_id.images.downsized.url
Download the GIF
Now that you have a URL, it's time to download the GIF. I will use the requests library to do this, install it with pip if you don't have in your environment. Seems that you already tried to do this, but with an error.
import requests
[...]
with open('test.gif','wb') as f:
f.write(requests.get(url_gif).content)
Display the GIF
There are a bunch of GUIs for Python to do this, or you can even invoke a browser to show it. You need to investigate which GUI adapts better to your needs. For this case, I will use the example posted here, with a few modifications,to display the Gif using TKinter. Install Tkinter if isn't included with your Python installation.
Final code:
import giphy_client as gc
from giphy_client.rest import ApiException
from random import randint
import requests
from tkinter import *
import time
import os
root = Tk()
api_instance = gc.DefaultApi()
api_key = 'YOUR_OWN_API_KEY'
query = 'art'
fmt = 'gif'
try:
response = api_instance.gifs_search_get(api_key,query,limit=1,offset=randint(1,10),fmt=fmt)
gif_id = response.data[0]
url_gif = gif_id.images.downsized.url
except ApiException:
print("Exception when calling DefaultApi->gifs_search_get: %s\n" % e)
with open('test.gif','wb') as f:
f.write(requests.get(url_gif).content)
frames = []
i = 0
while True: # Add frames until out of range
try:
frames.append(PhotoImage(file='test.gif',format = 'gif -index %i' %(i)))
i = i + 1
except TclError:
break
def update(ind): # Display and loop the GIF
if ind >= len(frames):
ind = 0
frame = frames[ind]
ind += 1
label.configure(image=frame)
root.after(100, update, ind)
label = Label(root)
label.pack()
root.after(0, update, 0)
root.mainloop()
Keep learning how to use a REST API, and Swagger, if you want to keep using the giphy_client library. If not, you can make the requests directly using the requests library.

Posting to FB group with requests, allow youtube video to load

I made a simple python script that posts a random youtube video and a quote to Facebook group(s).
The problem is, that it doesn't give Facebook the time to load the random video. To be more specific, at the moment the post looks like this:
But I want it to look like this:
My current code looks like this (I omitted sensitive data):
""" Song of the day script """
import facebook
import os
from pyquery import PyQuery
import requests
import random
class Sofy(object):
GROUPS = ["123", "123"]
FB_ACCESS_TOKEN = "123accesstoken"
PLAYLISTS = ["123youtubeplaylist"]
VIDEOS = []
def get_video(self):
req = requests.get("https://www.youtube.com/playlist?list={}".format(self.PLAYLISTS[0]))
pq = PyQuery(req.text)
for video in pq(".pl-video").items():
self.VIDEOS.append(video.attr("data-video-id"))
return "https://www.youtube.com/watch?v={}".format(random.choice(self.VIDEOS[-5:]))
def get_qoute(self):
pwd = os.path.dirname(os.path.realpath(__file__))
fx = pwd + '/quotes.txt'
lines = open(fx).read().splitlines()
return random.choice(lines)
def run(self):
quote = self.get_qoute()
video = self.get_video()
graph = facebook.GraphAPI(access_token=self.FB_ACCESS_TOKEN, version='2.2')
for group in self.GROUPS:
graph.put_object(group, "feed", message="{}\n Song of the day: {}".format(quote, video))
print "All done :)"
if __name__=='__main__':
sofy = Sofy()
sofy.run()
I tried doing this with Selenium but it didn't quote work as expected. Also, this way looks cleaner, but I can't figure out how to let youtube video load, I'm not even sure if it's possible?
It doesn't look like you're actually sharing the link correctly, looks like you're adding the URL into the 'message' parameter -
It should be attached correctly if you specify it in the 'link' parameter

Python script for "Google search by image"

I have checked Google Search API's and it seems that they have not released any API for searching "Images". So, I was wondering if there exists a python script/library through which I can automate the "search by image feature".
This was annoying enough to figure out that I thought I'd throw a comment on the first python-related stackoverflow result for "script google image search". The most annoying part of all this is setting up your proper application and custom search engine (CSE) in Google's web UI, but once you have your api key and CSE, define them in your environment and do something like:
#!/usr/bin/env python
# save top 10 google image search results to current directory
# https://developers.google.com/custom-search/json-api/v1/using_rest
import requests
import os
import sys
import re
import shutil
url = 'https://www.googleapis.com/customsearch/v1?key={}&cx={}&searchType=image&q={}'
apiKey = os.environ['GOOGLE_IMAGE_APIKEY']
cx = os.environ['GOOGLE_CSE_ID']
q = sys.argv[1]
i = 1
for result in requests.get(url.format(apiKey, cx, q)).json()['items']:
link = result['link']
image = requests.get(link, stream=True)
if image.status_code == 200:
m = re.search(r'[^\.]+$', link)
filename = './{}-{}.{}'.format(q, i, m.group())
with open(filename, 'wb') as f:
image.raw.decode_content = True
shutil.copyfileobj(image.raw, f)
i += 1
There is no API available but you are can parse the page and imitate the browser, but I don't know how much data you need to parse because google may limit or block access.
You can imitate the browser by simply using urllib and setting correct headers, but if you think parsing complex web-pages may be difficult from python, you can directly use a headless browser like phontomjs, inside a browser it is trivial to get correct elements using javascript/DOM
Note before trying all this check google's TOS
You can try this:
https://developers.google.com/image-search/v1/jsondevguide#json_snippets_python
It's deprecated, but seems to work.

How can I take a screenshot/image of a website using Python?

What I want to achieve is to get a website screenshot from any website in python.
Env: Linux
Here is a simple solution using webkit:
http://webscraping.com/blog/Webpage-screenshots-with-webkit/
import sys
import time
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *
class Screenshot(QWebView):
def __init__(self):
self.app = QApplication(sys.argv)
QWebView.__init__(self)
self._loaded = False
self.loadFinished.connect(self._loadFinished)
def capture(self, url, output_file):
self.load(QUrl(url))
self.wait_load()
# set to webpage size
frame = self.page().mainFrame()
self.page().setViewportSize(frame.contentsSize())
# render image
image = QImage(self.page().viewportSize(), QImage.Format_ARGB32)
painter = QPainter(image)
frame.render(painter)
painter.end()
print 'saving', output_file
image.save(output_file)
def wait_load(self, delay=0):
# process app events until page loaded
while not self._loaded:
self.app.processEvents()
time.sleep(delay)
self._loaded = False
def _loadFinished(self, result):
self._loaded = True
s = Screenshot()
s.capture('http://webscraping.com', 'website.png')
s.capture('http://webscraping.com/blog', 'blog.png')
Here is my solution by grabbing help from various sources. It takes full web page screen capture and it crops it (optional) and generates thumbnail from the cropped image also. Following are the requirements:
Requirements:
Install NodeJS
Using Node's package manager install phantomjs: npm -g install phantomjs
Install selenium (in your virtualenv, if you are using that)
Install imageMagick
Add phantomjs to system path (on windows)
import os
from subprocess import Popen, PIPE
from selenium import webdriver
abspath = lambda *p: os.path.abspath(os.path.join(*p))
ROOT = abspath(os.path.dirname(__file__))
def execute_command(command):
result = Popen(command, shell=True, stdout=PIPE).stdout.read()
if len(result) > 0 and not result.isspace():
raise Exception(result)
def do_screen_capturing(url, screen_path, width, height):
print "Capturing screen.."
driver = webdriver.PhantomJS()
# it save service log file in same directory
# if you want to have log file stored else where
# initialize the webdriver.PhantomJS() as
# driver = webdriver.PhantomJS(service_log_path='/var/log/phantomjs/ghostdriver.log')
driver.set_script_timeout(30)
if width and height:
driver.set_window_size(width, height)
driver.get(url)
driver.save_screenshot(screen_path)
def do_crop(params):
print "Croping captured image.."
command = [
'convert',
params['screen_path'],
'-crop', '%sx%s+0+0' % (params['width'], params['height']),
params['crop_path']
]
execute_command(' '.join(command))
def do_thumbnail(params):
print "Generating thumbnail from croped captured image.."
command = [
'convert',
params['crop_path'],
'-filter', 'Lanczos',
'-thumbnail', '%sx%s' % (params['width'], params['height']),
params['thumbnail_path']
]
execute_command(' '.join(command))
def get_screen_shot(**kwargs):
url = kwargs['url']
width = int(kwargs.get('width', 1024)) # screen width to capture
height = int(kwargs.get('height', 768)) # screen height to capture
filename = kwargs.get('filename', 'screen.png') # file name e.g. screen.png
path = kwargs.get('path', ROOT) # directory path to store screen
crop = kwargs.get('crop', False) # crop the captured screen
crop_width = int(kwargs.get('crop_width', width)) # the width of crop screen
crop_height = int(kwargs.get('crop_height', height)) # the height of crop screen
crop_replace = kwargs.get('crop_replace', False) # does crop image replace original screen capture?
thumbnail = kwargs.get('thumbnail', False) # generate thumbnail from screen, requires crop=True
thumbnail_width = int(kwargs.get('thumbnail_width', width)) # the width of thumbnail
thumbnail_height = int(kwargs.get('thumbnail_height', height)) # the height of thumbnail
thumbnail_replace = kwargs.get('thumbnail_replace', False) # does thumbnail image replace crop image?
screen_path = abspath(path, filename)
crop_path = thumbnail_path = screen_path
if thumbnail and not crop:
raise Exception, 'Thumnail generation requires crop image, set crop=True'
do_screen_capturing(url, screen_path, width, height)
if crop:
if not crop_replace:
crop_path = abspath(path, 'crop_'+filename)
params = {
'width': crop_width, 'height': crop_height,
'crop_path': crop_path, 'screen_path': screen_path}
do_crop(params)
if thumbnail:
if not thumbnail_replace:
thumbnail_path = abspath(path, 'thumbnail_'+filename)
params = {
'width': thumbnail_width, 'height': thumbnail_height,
'thumbnail_path': thumbnail_path, 'crop_path': crop_path}
do_thumbnail(params)
return screen_path, crop_path, thumbnail_path
if __name__ == '__main__':
'''
Requirements:
Install NodeJS
Using Node's package manager install phantomjs: npm -g install phantomjs
install selenium (in your virtualenv, if you are using that)
install imageMagick
add phantomjs to system path (on windows)
'''
url = 'http://stackoverflow.com/questions/1197172/how-can-i-take-a-screenshot-image-of-a-website-using-python'
screen_path, crop_path, thumbnail_path = get_screen_shot(
url=url, filename='sof.png',
crop=True, crop_replace=False,
thumbnail=True, thumbnail_replace=False,
thumbnail_width=200, thumbnail_height=150,
)
These are the generated images:
Full web page screen
Cropped image from captured screen
Thumbnail of a cropped image
can do using Selenium
from selenium import webdriver
DRIVER = 'chromedriver'
driver = webdriver.Chrome(DRIVER)
driver.get('https://www.spotify.com')
screenshot = driver.save_screenshot('my_screenshot.png')
driver.quit()
https://sites.google.com/a/chromium.org/chromedriver/getting-started
On the Mac, there's webkit2png and on Linux+KDE, you can use khtml2png. I've tried the former and it works quite well, and heard of the latter being put to use.
I recently came across QtWebKit which claims to be cross platform (Qt rolled WebKit into their library, I guess). But I've never tried it, so I can't tell you much more.
The QtWebKit links shows how to access from Python. You should be able to at least use subprocess to do the same with the others.
11 years later...
Taking a website screenshot using Python3.6 and Google PageSpeedApi Insights v5:
import base64
import requests
import traceback
import urllib.parse as ul
# It's possible to make requests without the api key, but the number of requests is very limited
url = "https://duckgo.com"
urle = ul.quote_plus(url)
image_path = "duckgo.jpg"
key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
strategy = "desktop" # "mobile"
u = f"https://www.googleapis.com/pagespeedonline/v5/runPagespeed?key={key}&strategy={strategy}&url={urle}"
try:
j = requests.get(u).json()
ss_encoded = j['lighthouseResult']['audits']['final-screenshot']['details']['data'].replace("data:image/jpeg;base64,", "")
ss_decoded = base64.b64decode(ss_encoded)
with open(image_path, 'wb+') as f:
f.write(ss_decoded)
except :
print(traceback.format_exc())
exit(1)
Notes:
Live Demo
Pros: Free
Cons: Low Resolution
Get API Key
Docs
Limits:
Queries per day = 25,000
Queries per 100 seconds = 400
Using Rendertron is an option. Under the hood, this is a headless Chrome exposing the following endpoints:
/render/:url: Access this route e.g. with requests.get if you are interested in the DOM.
/screenshot/:url: Access this route if you are interested in a screenshot.
You would install rendertron with npm, run rendertron in one terminal, access http://localhost:3000/screenshot/:url and save the file, but a demo is available at render-tron.appspot.com making it possible to run this Python3 snippet locally without installing the npm package:
import requests
BASE = 'https://render-tron.appspot.com/screenshot/'
url = 'https://google.com'
path = 'target.jpg'
response = requests.get(BASE + url, stream=True)
# save file, see https://stackoverflow.com/a/13137873/7665691
if response.status_code == 200:
with open(path, 'wb') as file:
for chunk in response:
file.write(chunk)
I can't comment on ars's answer, but I actually got Roland Tapken's code running using QtWebkit and it works quite well.
Just wanted to confirm that what Roland posts on his blog works great on Ubuntu. Our production version ended up not using any of what he wrote but we are using the PyQt/QtWebKit bindings with much success.
Note: The URL used to be: http://www.blogs.uni-osnabrueck.de/rotapken/2008/12/03/create-screenshots-of-a-web-page-using-python-and-qtwebkit/ I've updated it with a working copy.
This is an old question and most answers are a bit dated.
Currently, I would do 1 of 2 things.
1. Create a program that takes the screenshots
I would use Pyppeteer to take screenshots of websites. This runs on the Puppeteer package. Puppeteer spins up a headless chrome browser, so the screenshots will look exactly like they would in a normal browser.
This is taken from the pyppeteer documentation:
import asyncio
from pyppeteer import launch
async def main():
browser = await launch()
page = await browser.newPage()
await page.goto('https://example.com')
await page.screenshot({'path': 'example.png'})
await browser.close()
asyncio.get_event_loop().run_until_complete(main())
2. Use a screenshot API
You could also use a screenshot API such as this one.
The nice thing is that you don't have to set everything up yourself but can simply call an API endpoint.
This is taken from the screenshot API's documentation:
import urllib.parse
import urllib.request
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
# The parameters.
token = "YOUR_API_TOKEN"
url = urllib.parse.quote_plus("https://example.com")
width = 1920
height = 1080
output = "image"
# Create the query URL.
query = "https://screenshotapi.net/api/v1/screenshot"
query += "?token=%s&url=%s&width=%d&height=%d&output=%s" % (token, url, width, height, output)
# Call the API.
urllib.request.urlretrieve(query, "./example.png")
Using a web service s-shot.ru (so it's not so fast), but quite easy to set up what need through the link configuration.
And you can easily capture full page screenshots
import requests
import urllib.parse
BASE = 'https://mini.s-shot.ru/1024x0/JPEG/1024/Z100/?' # you can modify size, format, zoom
url = 'https://stackoverflow.com/'#or whatever link you need
url = urllib.parse.quote_plus(url) #service needs link to be joined in encoded format
print(url)
path = 'target1.jpg'
response = requests.get(BASE + url, stream=True)
if response.status_code == 200:
with open(path, 'wb') as file:
for chunk in response:
file.write(chunk)
You can use Google Page Speed API to achieve your task easily. In my current project, I have used Google Page Speed API`s query written in Python to capture screenshots of any Web URL provided and save it to a location. Have a look.
import urllib2
import json
import base64
import sys
import requests
import os
import errno
# The website's URL as an Input
site = sys.argv[1]
imagePath = sys.argv[2]
# The Google API. Remove "&strategy=mobile" for a desktop screenshot
api = "https://www.googleapis.com/pagespeedonline/v1/runPagespeed?screenshot=true&strategy=mobile&url=" + urllib2.quote(site)
# Get the results from Google
try:
site_data = json.load(urllib2.urlopen(api))
except urllib2.URLError:
print "Unable to retreive data"
sys.exit()
try:
screenshot_encoded = site_data['screenshot']['data']
except ValueError:
print "Invalid JSON encountered."
sys.exit()
# Google has a weird way of encoding the Base64 data
screenshot_encoded = screenshot_encoded.replace("_", "/")
screenshot_encoded = screenshot_encoded.replace("-", "+")
# Decode the Base64 data
screenshot_decoded = base64.b64decode(screenshot_encoded)
if not os.path.exists(os.path.dirname(impagepath)):
try:
os.makedirs(os.path.dirname(impagepath))
except OSError as exc:
if exc.errno != errno.EEXIST:
raise
# Save the file
with open(imagePath, 'w') as file_:
file_.write(screenshot_decoded)
Unfortunately, following are the drawbacks. If these do not matter, you can proceed with Google Page Speed API. It works well.
The maximum width is 320px
According to Google API Quota, there is a limit of 25,000 requests per day
You don't mention what environment you're running in, which makes a big difference because there isn't a pure Python web browser that's capable of rendering HTML.
But if you're using a Mac, I've used webkit2png with great success. If not, as others have pointed out there are plenty of options.
I created a library called pywebcapture that wraps selenium that will do just that:
pip install pywebcapture
Once you install with pip, you can do the following to easily get full size screenshots:
# import modules
from pywebcapture import loader, driver
# load csv with urls
csv_file = loader.CSVLoader("csv_file_with_urls.csv", has_header_bool, url_column, optional_filename_column)
uri_dict = csv_file.get_uri_dict()
# create instance of the driver and run
d = driver.Driver("path/to/webdriver/", output_filepath, delay, uri_dict)
d.run()
Enjoy!
https://pypi.org/project/pywebcapture/
Try this..
#!/usr/bin/env python
import gtk.gdk
import time
import random
while 1 :
# generate a random time between 120 and 300 sec
random_time = random.randrange(120,300)
# wait between 120 and 300 seconds (or between 2 and 5 minutes)
print "Next picture in: %.2f minutes" % (float(random_time) / 60)
time.sleep(random_time)
w = gtk.gdk.get_default_root_window()
sz = w.get_size()
print "The size of the window is %d x %d" % sz
pb = gtk.gdk.Pixbuf(gtk.gdk.COLORSPACE_RGB,False,8,sz[0],sz[1])
pb = pb.get_from_drawable(w,w.get_colormap(),0,0,0,0,sz[0],sz[1])
ts = time.time()
filename = "screenshot"
filename += str(ts)
filename += ".png"
if (pb != None):
pb.save(filename,"png")
print "Screenshot saved to "+filename
else:
print "Unable to get the screenshot."
import subprocess
def screenshots(url, name):
subprocess.run('webkit2png -F -o {} {} -D ./screens'.format(name, url),
shell=True)

Categories