Getting length of YouTube video (without downloading the video itself)

Getting length of YouTube video (without downloading the video itself) - python

I need to figure out the simplest method of grabbing the length of a youtube video programmatically given the url of said video.
Is the youtube API the best method? It looks somewhat complicated and I've never used it before so it's likely to take me a bit to get accommodated, but I really just want the quickest solution. I took a glance through the source of a video page in the hopes it might list it there, but apparently it does not (though it lists recommended video times in a very nice list that would be easy to parse). If it is the best method, does anyone have a snippit?
Ideally I could get this done in Python, and I need it to ultimately be in the format of
00:00:00.000
but I'm completely open to any solutions anyone may have.
I'd appreciate any insight.

All you have to do is read the seconds attribute in the yt:duration element from the XML returned by Youtube API 2.0. You only end up with seconds resolution (no milliseconds yet). Here's an example:
from datetime import timedelta
from urllib2 import urlopen
from xml.dom.minidom import parseString
for vid in ('wJ4hPaNyHnY', 'dJ38nHlVE78', 'huXaL8qj2Vs'):
url = 'https://gdata.youtube.com/feeds/api/videos/{0}?v=2'.format(vid)
s = urlopen(url).read()
d = parseString(s)
e = d.getElementsByTagName('yt:duration')[0]
a = e.attributes['seconds']
v = int(a.value)
t = timedelta(seconds=v)
print(t)
And the output is:
0:00:59
0:02:24
0:04:49

(I'm not sure what "pre-download" refers to.)
The simplest way to get the length of VIDEO_ID is to make an HTTP request for
http://gdata.youtube.com/feeds/api/videos/VIDEO_ID?v=2&alt=jsonc
and then look at the value of the data->duration element that's returned. It will be set to the video's duration in seconds.

With python and V3 youtube api this is the way for every videos.
You need the API key, you can get it here: https://console.developers.google.com/
# -*- coding: utf-8 -*-
import json
import urllib
video_id="6_zn4WCeX0o"
api_key="Your API KEY replace it!"
searchUrl="https://www.googleapis.com/youtube/v3/videos?id="+video_id+"&key="+api_key+"&part=contentDetails"
response = urllib.urlopen(searchUrl).read()
data = json.loads(response)
all_data=data['items']
contentDetails=all_data[0]['contentDetails']
duration=contentDetails['duration']
print duration
Console response:
>>>PT6M22S
Corresponds to 6 minutes and 22 seconds.

You can always utilize Data API v3 for this.
Just do a videos->list call.
GET https://www.googleapis.com/youtube/v3/videos?part=contentDetails%2C+fileDetails&id={VIDEO_ID}&key={YOUR_API_KEY}
In response get the contentDetails.duration in ISO 8601 format.
Or you can get duration in ms from fileDetails.durationMs.

If you're using Python 3 or newer you can perform a GET request against the YouTube v3 API URL. For this you will need the enable the YouTube v3 API in your Google Console and you'll need to create an API credential after you enable the YouTube v3 API.
Code examples below:
import json
import requests
YOUTUBE_ID = 'video_id_here'
API_KEY = 'your_youtube_v3_api_key'
url = f"https://www.googleapis.com/youtube/v3/videos?part=contentDetails&id={YOUTUBE_ID}&key={API_KEY}"
response = requests.get(url) # Perform the GET request
data = response.json() # Read the json response and convert it to a Python dictionary
length = data['items'][0]['contentDetails']['duration']
print(length)
Or as a reusable function:
import json
import requests
API_KEY = 'your_youtube_v3_api_key'
def get_youtube_video_duration(video_id):
url = f"https://www.googleapis.com/youtube/v3/videos?part=contentDetails&id={video_id}&key={API_KEY}"
response = requests.get(url) # Perform the GET request
data = response.json() # Read the json response and convert it to a Python dictionary
return data['items'][0]['contentDetails']['duration']
duration = get_youtube_video_duration('your_video_id')
Note: You can only get fileDetails from the API if you own the video, so you'll need to use the same Google account for your YouTube v3 API key as your YouTube account.
The response from Google will look something like this:
{
"kind": "youtube#videoListResponse",
"etag": "\"SJajsdhlkashdkahdkjahdskashd4/meCiVqMhpMVdDhIB-dj93JbqLBE\"",
"pageInfo": {
"totalResults": 1,
"resultsPerPage": 1
},
"items": [
{
"kind": "youtube#video",
"etag": "\"SJZWTasdasd12389ausdkhaF94/aklshdaksdASDASddjsa12-18FQ\"",
"id": "your_video_id",
"contentDetails": {
"duration": "PT4M54S",
"dimension": "2d",
"definition": "hd",
"caption": "false",
"licensedContent": false,
"projection": "rectangular"
}
}
]
}
Where your video duration is: PT4M54S which means 4 Minutes 54 Seconds
Edit: To convert the YouTube duration to seconds, see this answer: https://stackoverflow.com/a/49976787/2074077
Once you convert to time to seconds, you can convert seconds into your format with a timedelta.
from datetime import timedelta
time = timedelta(seconds=duration_in_seconds)
print(time)

Related

I can't get binance Futures order book historical data

I'm trying to get binance Futures order history data using API. So I asked for data from binance, got the answer "Your application for historical futures order book data has been approved, please follow our Github guidance to access with your whitelisted account API key" and I have set up the API as follows.
And I have modified the Enable Symbol Whitelist like this:
The next step, I followed Github guidance: https://github.com/binance/binance-public-data/tree/master/Futures_Order_Book_Download
which has the following sample code:
"""
This example python script shows how to download the Historical Future Order Book level 2 Data via API.
The data download API is part of the Binance API (https://binance-docs.github.io/apidocs/spot/en/#general-api-information).
For how to use it, you may find info there with more examples, especially SIGNED Endpoint security as in https://binance-docs.github.io/apidocs/spot/en/#signed-trade-user_data-and-margin-endpoint-security
Before executing this file, please note:
- The API account needs to have a Futures account to access Futures data.
- The API key has been whitelisted to access the data.
- Read the comments section in this file to know where you should specify your request values.
"""
# Install the following required packages
import requests
import time
import hashlib
import hmac
from urllib.parse import urlencode
S_URL_V1 = "https://api.binance.com/sapi/v1"
# Specify the api_key and secret_key with your API Key and secret_key
api_key = "your_api_key"
secret_key = "your_secret_key "
# Specify the four input parameters below:
symbol = "ADAUSDT" # specify the symbol name
startTime = 1635561504914 # specify the starttime
endTime = 1635561604914 # specify the endtime
dataType = "T_DEPTH" # specify the dataType to be downloaded
# Function to generate the signature
def _sign(params={}):
data = params.copy()
ts = str(int(1000 * time.time()))
data.update({"timestamp": ts})
h = urlencode(data)
h = h.replace("%40", "#")
b = bytearray()
b.extend(secret_key.encode())
signature = hmac.new(b, msg=h.encode("utf-8"), digestmod=hashlib.sha256).hexdigest()
sig = {"signature": signature}
return data, sig
# Function to generate the download ID
def post(path, params={}):
sign = _sign(params)
query = urlencode(sign[0]) + "&" + urlencode(sign[1])
url = "%s?%s" % (path, query)
header = {"X-MBX-APIKEY": api_key}
resultPostFunction = requests.post(url, headers=header, timeout=30, verify=True)
return resultPostFunction
# Function to generate the download link
def get(path, params):
sign = _sign(params)
query = urlencode(sign[0]) + "&" + urlencode(sign[1])
url = "%s?%s" % (path, query)
header = {"X-MBX-APIKEY": api_key}
resultGetFunction = requests.get(url, headers=header, timeout=30, verify=True)
return resultGetFunction
"""
Beginning of the execution.
The final output will be:
- A link to download the specific data you requested with the specific parameters.
Sample output will be like the following: {'expirationTime': 1635825806, 'link': 'https://bin-prod-user-rebate-bucket.s3.amazonaws.com/future-data-download/XXX'
Copy the link to the browser and download the data. The link would expire after the expirationTime (usually 24 hours).
- A message reminding you to re-run the code and download the data hours later.
Sample output will be like the following: {'link': 'Link is preparing; please request later. Notice: when date range is very large (across months), we may need hours to generate.'}
"""
timestamp = str(
int(1000 * time.time())
) # current timestamp which serves as an input for the params variable
paramsToObtainDownloadID = {
"symbol": symbol,
"startTime": startTime,
"endTime": endTime,
"dataType": dataType,
"timestamp": timestamp,
}
# Calls the "post" function to obtain the download ID for the specified symbol, dataType and time range combination
path = "%s/futuresHistDataId" % S_URL_V1
resultDownloadID = post(path, paramsToObtainDownloadID)
print(resultDownloadID)
downloadID = resultDownloadID.json()["id"]
print(downloadID) # prints the download ID, example: {'id': 324225}
# Calls the "get" function to obtain the download link for the specified symbol, dataType and time range combination
paramsToObtainDownloadLink = {"downloadId": downloadID, "timestamp": timestamp}
pathToObtainDownloadLink = "%s/downloadLink" % S_URL_V1
resultToBeDownloaded = get(pathToObtainDownloadLink, paramsToObtainDownloadLink)
print(resultToBeDownloaded)
print(resultToBeDownloaded.json())
I have modified api_key and secret_key to my own keys and this is the result I got.
Can you tell me where I made a mistake? Thanks in advance for the answer.

Look at https://www.binance.com/en-NG/landing/data.
Futures Order Book Data Available only on Binance Futures. It requires
futures account be whitelisted first and can only be download via API.
Orderbook snapshot (S_Depth): Since January 2020, only on BTC/USDT
symbol. Tick-level orderbook (T_Depth): Since January 2020, on all
symbols
The page says you should to apply the Binance form to be whitelisted in futures section here:
https://docs.google.com/forms/d/e/1FAIpQLSexCgyvZEMI1pw1Xj6gwKtfQTYUbH5HrUQ0gwgPZtM9FaM2Hw/viewform
Second thing - you are interested in futures, not spots, so the url should be api.binance.com/fapi instead of api.binance.com/sapi
Third thing - API endpoint for order book is
GET /fapi/v1/depth

CloudKit Server-to-Server auth: Keep getting 401 Authentication failed

I have been recently exploring the CloudKit and related frameworks. I got the communication with my app working, as well as with my website using CloudKitJS. Where I am struggling is the Server-to-Server communication (which I would need for exporting data from public database in csv.
I have tried Python package requests-cloudkit, which others were suggesting. I have created a Server-to-Server token, and have copied only the key between START and END line once creating the eckey.pem file. I then got this code:
from requests_cloudkit import CloudKitAuth
from restmapper import restmapper
import json
KEY_ID = '[my key ID from CK Dashboard]'
SECRET_FILE_KEY = 'eckey.pem'
AUTH = CloudKitAuth(KEY_ID, SECRET_FILE_KEY)
PARAMS = {
'query':{
'recordType': '[my record type]'
},
}
CloudKit = restmapper.RestMapper("https://api.apple-cloudkit.com/database/1/[my container]/development/")
cloudkit = CloudKit(auth=AUTH)
response = cloudkit.POST.public.records.query(json.dumps(PARAMS))
I am then getting the 401 Authentication failed response. I am stuck on this for days, so I would be grateful for any help or advice. 😊

Creating the server-to-server key is an important first step, but in order to make HTTP requests after that, you have to sign each request.
Look for the Authenticate Web Service Requests section near the bottom of this documentation page.
It's a little bit convoluted, but you have to carefully construct signed headers to include with each request you make. I'm not familiar with how to do it in Python, but here's how I do it in NodeJS which may help:
//Get the timestamp in a very specific format
let date = moment().utc().format('YYYY-MM-DD[T]HH:mm:ss[Z]')
//Construct the subpath
let endpoint = '/records/lookup'
let path = '/database/1/iCloud.*****/development/public'
let subpath = path+endpoint
//Get the key file
let privateKeyFile = fs.readFileSync('../../'+SECRET_FILE_KEY, 'utf8')
//Make a string out of your JSON query
let query = {
recordType: '[my record type]'
}
let requestBody = JSON.stringify(query)
//Hash the query
let bodyHash = crypto.createHash('sha256').update(requestBody, 'utf8').digest('base64')
//Assemble the components you just generated in a special format
//[Current date]:[Request body]:[Web service URL subpath]
let message = date+':'+bodyHash+':'+subpath
//Sign it
let signature = crypto.createSign('RSA-SHA256').update(message).sign(privateKeyFile, 'base64')
//Assemble your headers and include them in your HTTP request
let headers = {
'X-Apple-CloudKit-Request-KeyID': KEY_ID,
'X-Apple-CloudKit-Request-ISO8601Date': date,
'X-Apple-CloudKit-Request-SignatureV1': signature
}
This is a bit hairy at first, but I just put all this stuff in a function that I reuse whenever I need to make a request.
Apple's documentation has pretty much been abandoned and it's hard to find good help with CloudKit Web Services these days.

Calendar info using confluence API

I have a Confluence page which has a Calendar inside it (please check photo below).
Calendar
I am trying to pull information from this calendar, like how many events in each day. nothing more.
i used a code from stackoverflow that reads Confluence page using API. but the json response does not contain any data about the calendar inside the page.
`import requests
import json
from requests.auth import HTTPDigestAuth
confluence_host = "https://confluence.tools.mycompany.com"
url = confluence_host + '/rest/api/content/'
page_id = "36013799"
page = requests.get(url=url + page_id,
params={'expand': 'body.storage'},
auth=('my_user', 'my_password') ).json()`
Even if i write, html = page['body']['storage']['value'] and check its output, it only gives this:
name="calendar" ac:schema-version="1" ac:macro-id="99a26d73-abaa-45a1-92cc-0edafec567f5">72da4ae5-4888-46dd-9078-0299b51ab815,743a55b4-7b3b-4e00-b102-90d95916de8d
Is there any way to get the calendar info ?
Thanks

You are using Team Calendar in your page and Team Calendar is a plugin in your page. Technically, /rest/api/content only gives you the content of the page not the Content of the Plugins. As far as I know, Team Calendar doesn't have Public Rest API as you can see on CONFSERVER-51323 but you can get the data that you want from the database instead of REST API since Team Calendar has already creates couple of AO Tables in your database.

I found it easiest to get the subscribe link to the calendar then use an iCalendar library to parse the data. Make sure the subscribe button gives you a link with a {guid}.ics and not undefined.ics - To solve that I had to go to the calendars link in the main confluence space and then select it from the dropdown. You may have to create an empty calendar so you can select a cal.

I was able by looking at the GET and PUT, there is a rest API used by the javascript plugin (rest/calendar-services/1.0/calendar/events.json):
you need to find out your: subCalendarId='yourID'
urlC = 'https://yourconfluence.com/rest/calendar-services/1.0/calendar/events.json?subCalendarId=40xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx&userTimeZoneId=America%2FMexico_City&start=2018-11-28T00%3A00%3A00Z&end=2018-11-28T00%3A00%3A00Z'
r = requests.get(urlC, auth=("myuser", "mypass"), timeout=15)
that will return all the records on that period:
a = r.json()
a.keys()
[u'events', u'success']
a['success']
True
type(a['events'])
list
len(a['events'])
61
Use the following data in a PUT to add new events:
data = {
"subCalendarId": "xxx-xxx-xxx",
"eventType": "custom",
"customEventTypeId": "xxx-xxx-xxx",
"what": "My Test",
"person": "xxxxxxxxxxxxxxxxx",
"startDate": "28-Nov-2018",
"startTime": "15:00",
"endDate": "28-Nov-2018",
"endTime": "16:00",
"allDayEvent": "false",
"editAllInRecurrenceSeries": "true",
"where": "Some Place",
"description": "My testing Case",
"userTimeZoneId": "America/Mexico_City",}
urlC = 'https://yourconfluence.com/rest/calendar-services/1.0/calendar/events.json'
r = requests.put(urlC, auth=('username', 'pass'), data=data, timeout=15)
that will return a 'success': true with the new entry:
u'{"success":true,"subCalendar":{"reminderMe":false,........}}

After going through the developer console in chrome, After analyzing the format of the payload and required authentication details I found solution for this,
My problem statement was little different I have to add the event to confluence calendar, Both adding event and extracting event will follow the same process.
There are few cookies which are required for authentication like JSESSIONID and seraphConfluence Which will be stored in the application -> cookies in chrome developer tool.
and also we require the subCalenderid and Id Type which can be taken from the application-> local storage in in chrome developer tool.
and also , The confluence will send request using 'application/x-www-form-urlencoded' as Content-Type, So in the data we should have it in encoded format,
For this we can use below code to convert to that format
import urllib
urllib.parse.quote_plus('May 4, 2022')
output:
'May+4%2C+2022'
And also date and type should be in the MMM D, YYYY and h:MM A format you can use arrow python package to do the work
arrow.utcnow.format(MMM D, YYYY)
output
May 4, 2022
Below there is a string from the request payload in chrome when it send put request when we click add event , if we analyse the string we can see that we have
confirmRemoveInvalidUsers=false&childSubCalendarId=&customEventTypeId=asdfghjk-asdf-asdf-asfg-sdfghjssdfgh&eventType=custom&isSingleJiraDate=false&originalSubCalendarId=&originalStartDate=&originalEventType=&originalCustomEventTypeId=&recurrenceId=&subCalendarId=asdfghjk-asdf-asdf-asdg-asdfghjkl&uid=&what=test&startDate=May+4%2C+2022&startTime=&endDate=May+4%2C+2022&endTime=&allDayEvent=true&rruleStr=&until=&editAllInRecurrenceSeries=true&where=&url=&description=&userTimeZoneId=America%2FNew_York
After analysing it we can come to conclution we have to replace start date,enddate , start and end time, what , where , subcalendar id and type and other fields with our code and send the request.
Below is the code which will do that
def addEventtoCalender():
reqUrl = 'https://confluence.yourdomain.com/rest/calendar-services/1.0/calendar/events.json'
authDetails = getConfluenceAuthenticationDetails()
what=urllib.parse.quote_plus('WHAT field data')
startDate = urllib.parse.quote_plus(arrow.utcnow().format('MMM D, YYYY'))
startTime=urllib.parse.quote_plus(arrow.utcnow().format('h:MM A'))
endDate=urllib.parse.quote_plus(arrow.utcnow().format('MMM D, YYYY'))
endTime=urllib.parse.quote_plus(arrow.utcnow().shift(hours=+1).format('h:MM A'))
where=urllib.parse.quote_plus('WHERE field data')
url=urllib.parse.quote_plus('https://yoururl.com')
description=urllib.parse.quote_plus('test test test')
customEventTypeId = authDetails['CONFLUENCE_CUSTOM_EVENT_TYPE_ID'] #subcalender type
subCalendarId = authDetails['CONFLUENCE_SUBCALENDAR_ID']
seraphConfluence = authDetails['CONFLUENCE_SERAPH_CONFLUENCE']
JSESSIONID = authDetails['CONFLUENCE_JSESSION_ID']
data = f'confirmRemoveInvalidUsers=false&childSubCalendarId=&customEventTypeId={customEventTypeId}&eventType=custom&isSingleJiraDate=false&originalSubCalendarId=&originalStartDate=&originalEventType=&originalCustomEventTypeId=&recurrenceId=&subCalendarId={subCalendarId}&uid=&what={what}&startDate={startDate}&startTime={startTime}&endDate={endDate}&endTime={endTime}&allDayEvent=false&rruleStr=&until=&editAllInRecurrenceSeries=true&where={where}&url={url}&description={description}&userTimeZoneId=America%2FNew_York'
headers = {
'Content-Type': 'application/x-www-form-urlencoded',
'Cookie': f'seraph.confluence={seraphConfluence}; JSESSIONID={JSESSIONID}'
}
res = requests.put(url=reqUrl,data=data,headers=headers,verify=False)
Above code will replicate the whole process of adding event to calendar. You can use the same approach to replicate getting all event between particular dates.

get my videos duration with youtube api

I am using this youtube api sample, to get duration of my uploaded videos. In this Resource representation https://developers.google.com/youtube/v3/docs/videos#snippet I can see structure of json, but can't get this part
Currently I am managed to get contentDetails with 'videoPublishedAt' it looks like this ({u'videoPublishedAt': u'2013-03-13T00:05:41.000Z', u'videoId': u'6PKHl3Kvppk'})
I added 'contentDetails' to 'part'
playlistitems_list_request = youtube.playlistItems().list(
playlistId=uploads_list_id,
part="snippet,contentDetails",
maxResults=50
)
And then changed video_id = playlist_item["contentDetails"] in last section while playlistitems_list_request:
But video_id = playlist_item["contentDetails"]["duration"] give KeyError: 'duration'
Here is full code without authentication part and imports. Full version could be found here https://github.com/youtube/api-samples/blob/master/python/my_uploads.py
# Retrieve the contentDetails part of the channel resource for the
# authenticated user's channel.
channels_response = youtube.channels().list(
mine=True,
part="contentDetails"
).execute()
for channel in channels_response["items"]:
# From the API response, extract the playlist ID that identifies the list
# of videos uploaded to the authenticated user's channel.
uploads_list_id = channel["contentDetails"]["relatedPlaylists"]["uploads"]
print "Videos in list %s" % uploads_list_id
# Retrieve the list of videos uploaded to the authenticated user's channel.
playlistitems_list_request = youtube.playlistItems().list(
playlistId=uploads_list_id,
part="snippet,contentDetails",
maxResults=50
)
while playlistitems_list_request:
playlistitems_list_response = playlistitems_list_request.execute()
# Print information about each video.
for playlist_item in playlistitems_list_response["items"]:
title = playlist_item["snippet"]["title"]
video_id = playlist_item["contentDetails"]
print "%s (%s)" % (title, video_id)
playlistitems_list_request = youtube.playlistItems().list_next(
playlistitems_list_request, playlistitems_list_response)
print

Here's how to get the duration. I'll just give you the Try-it using the Youtube API Explorer Videos.list and just implement it on your code.
I supplied the parameters for id which is the videoId of your youtube vid and contentDetails for part.
A successful response returned the duration of my video along with other metadata:
"contentDetails": {
"duration": "PT1M28S",
"dimension": "2d",
"definition": "hd",
"caption": "false",
"licensedContent": false,
"projection": "rectangular",
"hasCustomThumbnail": false
}
Here, it's 1 minute and 28 seconds.
Check Youtube Videos.list for additional reference.

How to enter a parameter for a HTTP GET request that doesn't have a name?

I'm using the requests module for python, and sending a GET request to a site as follows:
r = requests.get("https://www.youtube.com", params={"search_query":"Hello World"}).text
Which just returns the HTML of the page on YouTube that searches for "Hello World", which is the parameter for a field with the name "search_query".
However, let's say that one parameter I want to input does not have a name on the site, but is still part of the form.
The site I'm talking about has the following code:
<input type="text" id="youtube-url" value="http://www.youtube.com/watch?v=KMU0tzLwhBE" onclick="sALL(this)" autocomplete="off" style="width:466px;">
How would I go about sending a parameter to this specific input, considering it does not have a name?
Thanks
EDIT: The full HTML of the code:

This site doesn't do any normal submitting, everything is done via javascript.
When you push the button a GET request is sent like this:
"/a/pushItem/?item=" + escape(g("youtube-url").value)
+ "&el=na&bf=" + getBF()
+ "&r="+ (new Date()).getTime();
Then with the result of this, another is sent:
"/a/itemInfo/?video_id=" + video_id + "&ac=www&t=grp&r=" + a.getTime();
So in python you can try this:
import time
videoid = requests.get("http://www.youtube-mp3.org/a/pushItem/",
params={
"item": "your youtube video url",
"el": "na",
"bf": "false",
"r": int(time.time() * 1000000) # JS timestamps are in microseconds
}).text
info = requests.get("http://www.youtube-mp3.org/a/itemInfo/",
params={
"video_id": videoid,
"ac": "www",
"t": "grp",
"r": int(time.time() * 1000000)
}).text
And then you'll have to parse the info, which isn't even JSON, but more javascript, and do whatever you want with that data.
You might have to deal with CAPTCHAs or conversion progress.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Getting length of YouTube video (without downloading the video itself) - python

Related

I can't get binance Futures order book historical data

CloudKit Server-to-Server auth: Keep getting 401 Authentication failed

Calendar info using confluence API

get my videos duration with youtube api

How to enter a parameter for a HTTP GET request that doesn't have a name?

Categories

Resources