How can I get song name from internet radio stream?
Python: Get name of shoutcast/internet radio station from url I looked here, but there is only getting name of radio station. But how to get name of the playing song? Here is stream link from where I want to get name of song. http://pool.cdn.lagardere.cz/fm-evropa2-128
How should I do it? Can you help me please?
To get the stream title, you need to request metadata. See shoutcast/icecast protocol description:
#!/usr/bin/env python
from __future__ import print_function
import re
import struct
import sys
try:
import urllib2
except ImportError: # Python 3
import urllib.request as urllib2
url = 'http://pool.cdn.lagardere.cz/fm-evropa2-128' # radio stream
encoding = 'latin1' # default: iso-8859-1 for mp3 and utf-8 for ogg streams
request = urllib2.Request(url, headers={'Icy-MetaData': 1}) # request metadata
response = urllib2.urlopen(request)
print(response.headers, file=sys.stderr)
metaint = int(response.headers['icy-metaint'])
for _ in range(10): # # title may be empty initially, try several times
response.read(metaint) # skip to metadata
metadata_length = struct.unpack('B', response.read(1))[0] * 16 # length byte
metadata = response.read(metadata_length).rstrip(b'\0')
print(metadata, file=sys.stderr)
# extract title from the metadata
m = re.search(br"StreamTitle='([^']*)';", metadata)
if m:
title = m.group(1)
if title:
break
else:
sys.exit('no title found')
print(title.decode(encoding, errors='replace'))
The stream title is empty in this case.
Related
I built a simple RSS reader on Python and it is not working.
In addition, I want to get the featured image source link of every post and I didn't find a way to do so.
it shows me the Error: Traceback (most recent call last): File
"RSS_reader.py", line 7, in
feed_title = feed['feed']['title']
If there are some other RSS feeds that work fine. So I don't understand why there are some RSS feeds that are working and others that aren't
So I would like to understand why the code doesn't work and also how to get the featured image source link of a post
I attached the code, is written on Python 3.7
import feedparser
import webbrowser
feed = feedparser.parse("https://finance.yahoo.com/rss/")
feed_title = feed['feed']['title']
feed_entries = feed.entries
for entry in feed.entries:
article_title = entry.title
article_link = entry.link
article_published_at = entry.published # Unicode string
article_published_at_parsed = entry.published_parsed # Time object
article_author = entry.author
content = entry.summary
article_tags = entry.tags
print ("{}[{}]".format(article_title, article_link))
print ("Published at {}".format(article_published_at))
print ("Published by {}".format(article_author))
print("Content {}".format(content))
print("catagory{}".format(article_tags))
A few things.
1) First feed['feed']['title'] does not exist.
2) At least for this site entry.author, entry.tags do not exist
3) It seems feedparser is not compatible with python3.7 (it gives me KeyError, "object doesn't have key 'category')
So as a starting point try to run the following code in python 3.6 and go from there.
import feedparser
import webbrowser
feed = feedparser.parse("https://finance.yahoo.com/rss/")
# feed_title = feed['feed']['title'] # NOT VALID
feed_entries = feed.entries
for entry in feed.entries:
article_title = entry.title
article_link = entry.link
article_published_at = entry.published # Unicode string
article_published_at_parsed = entry.published_parsed # Time object
# article_author = entry.author DOES NOT EXIST
content = entry.summary
# article_tags = entry.tags DOES NOT EXIST
print ("{}[{}]".format(article_title, article_link))
print ("Published at {}".format(article_published_at))
# print ("Published by {}".format(article_author))
print("Content {}".format(content))
# print("catagory{}".format(article_tags))
Good luck.
You can also use xml parser libraries like beatifulsoup (https://www.crummy.com/software/BeautifulSoup/bs4/doc/) and create custom parsers. A sample customer parser code can be found here (https://github.com/vintageplayer/RSS-Parser). A walk through the same can read here (https://towardsdatascience.com/rss-feed-parser-in-python-553b1857055c)
Though libraries can be useful, beautifulsoup is an extremely handy library to try out.
I have used BeautifulSoup for a beginner RSS feed reader project (You need to install lxml for it to work since we are dealing with xml):
from bs4 import BeautifulSoup
import requests
url = requests.get('https://realpython.com/atom.xml')
soup = BeautifulSoup(url.content, 'xml')
entries = soup.find_all('entry')
for i in entries:
title = i.title.text
link = i.link['href']
summary = i.summary.text
print(f'Title: {title}\n\nSummary: {summary}\n\nLink: {link}\n\n------------------------\n')
You can find the Youtube video here:
https://www.youtube.com/watch?v=8HbqO-TfjlI
I wonder if it is possible to test the URL's from my bookmarks.
So I can see if the URL still is online or offline.
I can see that I can test it with Urllip2
Urllip2 code
import socket
from urllib2 import urlopen, URLError, HTTPError
socket.setdefaulttimeout( 23 ) # timeout in seconds
url = 'http://google.com/'
try :
response = urlopen( url )
except HTTPError, e:
print 'The server couldn\'t fulfill the request. Reason:', str(e.code)
except URLError, e:
print 'We failed to reach a server. Reason:', str(e.reason)
else :
html = response.read()
print 'got response!'
# do something, turn the light on/off or whatever
My question is, can I get the links/URL's from my bookmarks (Chrome) and the test the URL's in a loop (for) if the URL is Offline or Online.
EDIT 26/02/2019...
Have t/ried this code, and get no folder found error..
/
import json
from jsonpath_rw import parse
import os
# PArse te Bookmarks file from json into a dict
input_filename = os.path.join(os.getenv("APPDATA"), "\\Local\\Google\\Chrome\\User Data\\Default\\Bookmarks")
if os.path.isfile(input_filename):
with open(input_filename) as data_file:
bookmark_data = json.load(data_file)
# Set an xpath expression for all 'url' children
expr = parse('$..url')
# print the value of all url keys
print([x.value for x in expr.find(bookmark_data)])
else:
print("File not found!")
print(input_filename)
Chrome (or at least Chromium) stores your bookmarks in a file called Bookmarks in your chrome config area - on linux this is usually .config/chromium/Default/Bookmarks on Windows it is AppData\Local\Google\Chrome\User Data\Default\Bookmarks (though you may need to hunt for it if your system is different).
Assuming you wan to check all links, then you probably want to recursively walk the tree, looking for url keys and getting their values. Since this is JSON, I would recommend using the JSONPath library (https://readthedocs.org/projects/jsonpath-rw/), rather than writing your own recursion function:
import json
from jsonpath_rw import parse
# PArse te Bookmarks file from json into a dict
with open('Bookmarks') as bm:
data = json.load(bm)
# Set an xpath expression for all 'url' children
expr = parse('$..url')
# print the value of all url keys
print([x.value for x in expr.find(data)])
How can I get the duration of Youtube video? I am trying with this...
import gdata.youtube
import gdata.youtube.service
yt_service = gdata.youtube.service.YouTubeService()
entry = yt_service.GetYouTubeVideoEntry(video_id='the0KZLEacs')
print 'Video title: %s' % entry.media.title.text
print 'Video duration: %s' % entry.media.duration.seconds
Console response
Traceback (most recent call last):
File "/Users/LearningAnalytics/Dropbox/testing/youtube.py", line 8, in <module>
entry = yt_service.GetYouTubeVideoEntry(video_id='the0KZLEacs')
File "/Library/Python/2.7/site-packages/gdata/youtube/service.py", line 210, in GetYouTubeVideoEntry
return self.Get(uri, converter=gdata.youtube.YouTubeVideoEntryFromString)
File "/Library/Python/2.7/site-packages/gdata/service.py", line 1107, in Get
'reason': server_response.reason, 'body': result_body}
gdata.service.RequestError: {'status': 410, 'body': 'No longer available', 'reason': 'Gone'}
Two ways to get a youtube video duration
First way:
With python and V3 youtube api this is the way for every videos. You need the API key, you can get it here: https://console.developers.google.com/
# -*- coding: utf-8 -*-
import json
import urllib
video_id="6_zn4WCeX0o"
api_key="Your API KEY replace it!"
searchUrl="https://www.googleapis.com/youtube/v3/videos?id="+video_id+"&key="+api_key+"&part=contentDetails"
response = urllib.urlopen(searchUrl).read()
data = json.loads(response)
all_data=data['items']
contentDetails=all_data[0]['contentDetails']
duration=contentDetails['duration']
print duration
Console Response:
>>>PT6M22S
Corresponds to 6 minutes and 22 seconds.
Second way:
Another way but not works for all videos is with pafy external package:
import pafy
url = "http://www.youtube.com/watch?v=cyMHZVT91Dw"
video = pafy.new(url)
print video.length
I installed pafy from https://pypi.python.org/pypi/pafy/0.3.42
You can also try the following
search_url = f'https://www.googleapis.com/youtube/v3/videos?id={video_id}&key={YT_KEY}&part=contentDetails'
req = urllib.request.Request(search_url)
response = urllib.request.urlopen(req).read().decode('utf-8')
data = json.loads(response)
all_data = data['items']
duration = all_data[0]['contentDetails']['duration']
minutes = int(duration[2:].split('M')[0])
seconds = int(duration[-3:-1])
This will decode the response using utf-8 encoding.
This allowed me to store it into a json variable.
There is a very usefull library called pytube, where you can get a good amount of data from youtube, as the channel's name, video's length, you can also download the video or get codecs, etc. heres the DOC https://pytube.io/en/latest/api.html
from pytube import YouTube
video = "youtube_url"
yt = YouTube(video) ## this creates a YOUTUBE OBJECT
video_length = yt.length ## this will return the length of the video in sec as an int
I am using JSON library and trying to import a page feed to an CSV file. Tried many a ways to get the result however every time code execute it Gives JSON not serialzable. No Facebook use auth code which I have and used it so connection string will change however if you use a page which has public privacy you will still be able to get the result from below code.
following is the code
import urllib3
import json
import requests
#from pprint import pprint
import csv
from urllib.request import urlopen
page_id = "abcd" # username or id
api_endpoint = "https://graph.facebook.com"
fb_graph_url = api_endpoint+"/"+page_id
try:
#api_request = urllib3.Requests(fb_graph_url)
#http = urllib3.PoolManager()
#api_response = http.request('GET', fb_graph_url)
api_response = requests.get(fb_graph_url)
try:
#print (list.sort(json.loads(api_response.read())))
obj = open('data', 'w')
# write(json_dat)
f = api_response.content
obj.write(json.dumps(f))
obj.close()
except Exception as ee:
print(ee)
except Exception as e:
print( e)
Tried many approach but not successful. hope some one can help
api_response.content is the text content of the API, not a Python object so you won't be able to dump it.
Try either:
f = api_response.content
obj.write(f)
Or
f = api_response.json()
obj.write(json.dumps(f))
requests.get(fb_graph_url).content
is probably a string. Using json.dumps on it won't work. This function expects a list or a dictionary as the argument.
If the request already returns JSON, just write it to the file.
I want to get all video url's of a specific channel. I think json with python or java would be a good choice. I can get the newest video with the following code, but how can I get ALL video links (>500)?
import urllib, json
author = 'Youtube_Username'
inp = urllib.urlopen(r'http://gdata.youtube.com/feeds/api/videos?max-results=1&alt=json&orderby=published&author=' + author)
resp = json.load(inp)
inp.close()
first = resp['feed']['entry'][0]
print first['title'] # video title
print first['link'][0]['href'] #url
After the youtube API change, max k.'s answer does not work. As a replacement, the function below provides a list of the youtube videos in a given channel. Please note that you need an API Key for it to work.
import urllib
import json
def get_all_video_in_channel(channel_id):
api_key = YOUR API KEY
base_video_url = 'https://www.youtube.com/watch?v='
base_search_url = 'https://www.googleapis.com/youtube/v3/search?'
first_url = base_search_url+'key={}&channelId={}&part=snippet,id&order=date&maxResults=25'.format(api_key, channel_id)
video_links = []
url = first_url
while True:
inp = urllib.urlopen(url)
resp = json.load(inp)
for i in resp['items']:
if i['id']['kind'] == "youtube#video":
video_links.append(base_video_url + i['id']['videoId'])
try:
next_page_token = resp['nextPageToken']
url = first_url + '&pageToken={}'.format(next_page_token)
except:
break
return video_links
Short answer:
Here's a library That can help with that.
pip install scrapetube
import scrapetube
videos = scrapetube.get_channel("UC9-y-6csu5WGm29I7JiwpnA")
for video in videos:
print(video['videoId'])
Long answer:
The module mentioned above was created by me due to a lack of any other solutions. Here's what i tried:
Selenium. It worked but had three big drawbacks: 1. It requires a web browser and driver to be installed. 2. has big CPU and memory requirements. 3. can't handle big channels.
Using youtube-dl. Like this:
import youtube_dl
youtube_dl_options = {
'skip_download': True,
'ignoreerrors': True
}
with youtube_dl.YoutubeDL(youtube_dl_options) as ydl:
videos = ydl.extract_info(f'https://www.youtube.com/channel/{channel_id}/videos')
This also works for small channels, but for bigger ones i would get blocked by youtube for making so many requests in such a short time (because youtube-dl downloads more info for every video in the channel).
So i made the library scrapetube which uses the web API to get all the videos.
Increase max-results from 1 to however many you want, but beware they don't advise grabbing too many in one call and will limit you at 50 (https://developers.google.com/youtube/2.0/developers_guide_protocol_api_query_parameters).
Instead you could consider grabbing the data down in batches of 25, say, by changing the start-index until none came back.
EDIT: Here's the code for how I would do it
import urllib, json
author = 'Youtube_Username'
foundAll = False
ind = 1
videos = []
while not foundAll:
inp = urllib.urlopen(r'http://gdata.youtube.com/feeds/api/videos?start-index={0}&max-results=50&alt=json&orderby=published&author={1}'.format( ind, author ) )
try:
resp = json.load(inp)
inp.close()
returnedVideos = resp['feed']['entry']
for video in returnedVideos:
videos.append( video )
ind += 50
print len( videos )
if ( len( returnedVideos ) < 50 ):
foundAll = True
except:
#catch the case where the number of videos in the channel is a multiple of 50
print "error"
foundAll = True
for video in videos:
print video['title'] # video title
print video['link'][0]['href'] #url
Based on the code found here and at some other places, I've written a small script that does this. My script uses v3 of Youtube's API and does not hit against the 500 results limit that Google has set for searches.
The code is available over at GitHub: https://github.com/dsebastien/youtubeChannelVideosFinder
Independent way of doing things. No api, no rate limit.
import requests
username = "marquesbrownlee"
url = "https://www.youtube.com/user/username/videos"
page = requests.get(url).content
data = str(page).split(' ')
item = 'href="/watch?'
vids = [line.replace('href="', 'youtube.com') for line in data if item in line] # list of all videos listed twice
print(vids[0]) # index the latest video
This above code will scrap only limited number of video url's max upto 60. How to grab all the videos url which is present in the channel. Can you please suggest.
This above code snippet will display only the list of all the videos which is listed twice. Not all the video url's in the channel.
Using Selenium Chrome Driver:
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import time
driverPath = ChromeDriverManager().install()
driver = webdriver.Chrome(driverPath)
url = 'https://www.youtube.com/howitshouldhaveended/videos'
driver.get(url)
height = driver.execute_script("return document.documentElement.scrollHeight")
previousHeight = -1
while previousHeight < height:
previousHeight = height
driver.execute_script(f'window.scrollTo(0,{height + 10000})')
time.sleep(1)
height = driver.execute_script("return document.documentElement.scrollHeight")
vidElements = driver.find_elements_by_id('thumbnail')
vid_urls = []
for v in vidElements:
vid_urls.append(v.get_attribute('href'))
This code has worked the few times I've tried it; however, you might need to tweak the sleep time, or add a way to recognize when the browser is still loading the extra information. It easily worked for me for getting a channel with 300+ videos, but it was having an issue with one that had 7000+ videos due to the time required to load the new videos on the browser becoming inconsistent.
I modified the script originally posted by dermasmid to fit my needs. This is the result:
import scrapetube
import sys
path = '_list.txt'
sys.stdout = open(path, 'w')
videos = scrapetube.get_channel("UC9-y-6csu5WGm29I7JiwpnA")
for video in videos:
print("https://www.youtube.com/watch?v="+str(video['videoId']))
# print(video['videoId'])
Basically it is saves all the URLs from the playlist into a "_list.txt" file. I am using this "_list.txt" file to download all the videos using the yt-dlp.exe. All the downloaded files have the .mp4 extension.
Now I do need to create another "_playlist.txt" file that contains all the FILENAMES coresponding to each URL from the "_List.txt".
For example, for: "https://www.youtube.com/watch?v=yG1m7oGZC48" to have "Apple M1 Ultra & NUMA - Computerphile.mp4" as output into the "_playlist.txt"
I do made some further improvements, to be able to add the channel URL into the console, print the result on screen and also into an external file called "_list.txt".
import scrapetube
import sys
path = '_list.txt'
print('**********************\n')
print("The result will be saved in '_list.txt' file.")
print("Enter Channel ID:")
# Prints the output in the console and into the '_list.txt' file.
class Logger:
def __init__(self, filename):
self.console = sys.stdout
self.file = open(filename, 'w')
def write(self, message):
self.console.write(message)
self.file.write(message)
def flush(self):
self.console.flush()
self.file.flush()
sys.stdout = Logger(path)
# Strip the: "https://www.youtube.com/channel/"
channel_id_input = input()
channel_id = channel_id_input.strip("https://www.youtube.com/channel/")
videos = scrapetube.get_channel(channel_id)
for video in videos:
print("https://www.youtube.com/watch?v="+str(video['videoId']))
# print(video['videoId'])