Enable cache on ffmpeg to record streaming - python

now I'm using steamlink and ffmpeg to record streams and save them to a file, many times the video file saved have so much lag. I found this link https://www.reddit.com/r/Twitch/comments/62601b/laggy_stream_on_streamlinklivestreamer_but_not_on/
where they claim that the lag problem occurs from the fact of not having the cache enabled on the player.
I tried putting options -hls_allow_cache allowcache -segment_list_flags cache with the result that the ffmpeg process starts for 8seconds more or less, after which it ends and starts again immediately afterwards without returning a video file,if I don't put those two options the video is recorded correctly but most of the time with some lag.
Obviously if I visit streaming from the browser I have no lag problem
this is the code
from streamlink import Streamlink, NoPluginError, PluginError
streamlink = Streamlink()
#this code is just a snippet, it is inside a while loop to restart the process
streams = streamlink.streams(m3u8_url)
stream_url = streams['best'].url
#note hls options not seem to work
ffmpeg_process = Popen(
["ffmpeg", "-hide_banner", "-loglevel", "panic", "-y","-hls_allow_cache", "allowcache", "-segment_list_flags", "cache","-i", stream_url, "-fs", "10M", "-c", "copy",
"-bsf:a", "aac_adtstoasc", fileName])
except NoPluginError:
except PluginError:
except Exception as e:
what are the best options to enable the cache and limit the lag as much as possible?

You can read FFmpeg StreamingGuide for more details on Latency. For instances, you have
an option -fflags nobuffer which might possibly help, usually for
receiving streams ​reduce latency.
As you can read here about nobuffer
Reduce the latency introduced by buffering during initial input
streams analysis.

I simply solved the lag problem by avoiding using ffmpeg to save videos but using streamlink directly and writing a .mp4 file
streamlink = Streamlink()
streams = streamlink.streams(m3u8_url)
stream_url = streams['480p']
fd = stream_url.open()
out = open(fileName,"wb")
while True:
data = fd.read(1024)
if data is None or data == -1 or data == 0:
except NoPluginError:
#handle exception
except PluginError:
#handle exception
except StreamError:
#handle exception
except Exception as e:
#handle exception


Python 3 urllib: 530 too many connections, in loop

I am retrieving data files from a FTP server in a loop with the following code:
response = urllib.request.urlopen(url)
data = response.read()
compressed_file = io.BytesIO(data)
gin = gzip.GzipFile(fileobj=compressed_file)
Retrieving and processing the first few works fine, but after a few request I am getting the following error:
530 Maximum number of connections exceeded.
I tried closing the connection (see code above) and using a sleep() timer, but this both did not work. What is it I am doing wrong here?
Trying to make urllib do FTP properly makes my brain hurt. By default, it creates a new connection for each file, apparently without really properly ensuring the connections close.
ftplib is more appropriate I think.
Since I happen to be working on the same data you are(were)... Here is a very specific answer decompressing the .gz files and passing them into ish_parser (https://github.com/haydenth/ish_parser).
I think it is also clear enough to serve as a general answer.
import ftplib
import io
import gzip
import ish_parser # from: https://github.com/haydenth/ish_parser
ftp_host = "ftp.ncdc.noaa.gov"
parser = ish_parser.ish_parser()
# identifies what data to get
USAF_ID = '722950'
WBAN_ID = '23174'
YEARS = range(1975, 1980)
with ftplib.FTP(host=ftp_host) as ftpconn:
for year in YEARS:
ftp_file = "pub/data/noaa/{YEAR}/{USAF}-{WBAN}-{YEAR}.gz".format(USAF=USAF_ID, WBAN=WBAN_ID, YEAR=year)
# read the whole file and save it to a BytesIO (stream)
response = io.BytesIO()
ftpconn.retrbinary('RETR '+ftp_file, response.write)
except ftplib.error_perm as err:
if str(err).startswith('550 '):
print('ERROR:', err)
# decompress and parse each line
response.seek(0) # jump back to the beginning of the stream
with gzip.open(response, mode='rb') as gzstream:
for line in gzstream:
This does read the whole file into memory, which could probably be avoided using some clever wrappers and/or yield or something... but works fine for a year's worth of hourly weather observations.
Probably a pretty nasty workaround, but this worked for me. I made a script (here called test.py) which does the request (see code above). The code below is used in the loop I mentioned and calls test.py
from subprocess import call
with open('log.txt', 'a') as f:
call(['python', 'test.py', args[0], args[1]], stdout=f)

Downloading Streams Simulatenously with Python 3.5

EDIT: I think I've figured out a solution using subprocess.Popen with separate .py files for each stream being monitored. It's not pretty, but it works.
I'm working on a script to monitor a streaming site for several different accounts and to record when they are online. I am using the livestreamer package for downloading a stream when it comes online, but the problem is that the program will only record one stream at a time. I have the program loop through a list and if a stream is online, start recording with subprocess.call(["livestreamer"... The problem is that once the program starts recording, it stops going through the loop and doesn't check or record any of the other livestreams. I've tried using Process and Thread, but none of these seem to work. Any ideas?
Code below. Asterisks are not literally part of code.
import os,urllib.request,time,subprocess,datetime,random
status = {
def gen_name(tag):
return stuff <<Bunch of unimportant code stuff here.
def dl(tag):
def loopCheck():
while True:
for tag in status:
data = urllib.request.urlopen("http://*******.com/" + tag + "/").read().decode()
if data.find(".m3u8") != -1:
print(tag + " is online!")
if status[tag] == False:
status[tag] = True
print(tag+ " is offline.")
status[tag] = False

Real-time data transfer from Python to MATLAB

I am using python to read data from a USB input device. I would like to know if there is a way this could be exchanged with the model in MATLAB real-time. How I do it now is to save the data read in a .mat file and then let the model read it from there, which is not very intuitive.The code I use for this is as below:
#Import the needed libraries
import usb.core,usb.util,sys,time
import sys
from array import *
import scipy.io
#find our device
dev = usb.core.find(idVendor=0x046d, idProduct=0xc29a)
# was it found?
if dev is None:
raise ValueError('Device not found')
# set the active configuration. With no arguments, the first
# configuration will be the active one
#In the event of an error
except usb.core.USBError as e:
print('Cannot set configuration the device: %s' %str(e))
# get an endpoint instance
cfg = dev.get_active_configuration()
intf = cfg[(0,0)]
ep = usb.util.find_descriptor(
# match the first IN endpoint
custom_match = \
lambda e: \
usb.util.endpoint_direction(e.bEndpointAddress) == \
#Initialising variables
#Databases for access in MATLAB
#Read data from the device as long as it is connected
while(dev!= None):
#Read data
data = dev.read(ep.bEndpointAddress, ep.wMaxPacketSize,
except usb.core.USBError as e:
print("Error readin data: %s" %str(e))
You have a few options.
You can poll from within Matlab for the presence of a file, then read in the new data when available
You can open a pipe to perform inter-process communication between python and matlab (also requires polling from the matlab side). See here for code.
You can use a local UDP or TCP socket for communication. Either by using PNET (which will still require polling), or the matlab Instrument Control Toolbox (which allows you to configure a callback function).
Since matlab is single-threaded, your model will have to be designed with the provision of new data in mind. You will need to explicitly trigger the model to re-evaluate when new data is provided.

How to shutdown an httplib2 request when it is too long

I have a pretty annoying issue at the moment. When I process to a httplib2.request with a way too large page, I would like to be able to stop it cleanly.
For example :
from httplib2 import Http
url = 'http://media.blubrry.com/podacademy/p/content.blubrry.com/podacademy/Neuroscience_and_Society_1.mp3'
h = Http(timeout=5)
h.request(url, 'GET')
In this example, the url is a podcast and it will keep being downloaded forever. My main process will hang indefinitely in this situation.
I have tried to set it in a separate thread using this code and to delete straight my object.
def http_worker(url, q):
h = Http()
print 'Http worker getting %s' % url
q.put(h.request(url, 'GET'))
def process(url):
q = Queue.Queue()
t = Thread(target=http_worker, args=(url, q))
tid = t.ident
if t.isAlive():
del t
print 'deleting t'
except: print 'error deleting t'
else: print q.get()
Unfortunately, the thread is still active and will continue to consume cpu / memory.
def check_thread(tid):
import sys
print 'Thread id %s is still active ? %s' % (tid, tid in sys._current_frames().keys() )
Thank you.
Ok I found an hack to be able to deal with this issue.
The best solution so far is to set a maximum of data read and to stop reading from the socket. The data is read from the method _safe_read of httplib module. In order to overwrite this method, I used this lib : http://blog.rabidgeek.com/?tag=wraptools
And voila :
from httplib import HTTPResponse, IncompleteRead, MAXAMOUNT
from wraptools import wraps
def _safe_read(original_method, self, amt):
"""Read the number of bytes requested, compensating for partial reads.
Normally, we have a blocking socket, but a read() can be interrupted
by a signal (resulting in a partial read).
Note that we cannot distinguish between EOF and an interrupt when zero
bytes have been read. IncompleteRead() will be raised in this
This function should be used when <amt> bytes "should" be present for
reading. If the bytes are truly not available (due to EOF), then the
IncompleteRead exception can be used to detect the problem.
# NOTE(gps): As of svn r74426 socket._fileobject.read(x) will never
# return less than x bytes unless EOF is encountered. It now handles
# signal interruptions (socket.error EINTR) internally. This code
# never caught that exception anyways. It seems largely pointless.
# self.fp.read(amt) will work fine.
s = []
total = 0
MAX_FILE_SIZE = 3*10**6
while amt > 0 and total < MAX_FILE_SIZE:
chunk = self.fp.read(min(amt, httplib.MAXAMOUNT))
if not chunk:
raise IncompleteRead(''.join(s), amt)
total = total + len(chunk)
amt -= len(chunk)
return ''.join(s)
In this case, MAX_FILE_SIZE is set to 3Mb.
Hopefully, this will help others.

urllib2 urlopen read timeout/block

Recently I am working on a tiny crawler for downloading images on a url.
I use openurl() in urllib2 with f.open()/f.write():
Here is the code snippet:
# the list for the images' urls
imglist = re.findall(regImg,pageHtml)
# iterate to download images
for index in xrange(1,len(imglist)+1):
img = urllib2.urlopen(imglist[index-1])
f = open(r'E:\OK\%s.jpg' % str(index), 'wb')
print('To Read...')
# potential timeout, may block for a long time
# so I wonder whether there is any mechanism to enable retry when time exceeds a certain threshold
print('Image %d is ready !' % index)
In the code above, the img.read() will potentially block for a long time, I hope to do some retry/re-open the image url operation under this issue.
I also concern on the efficient perspective of the code above, if the number of the images to be downloaded is somewhat big, using a thread pool to download them seems to be better.
Any suggestions? Thanks in advance.
p.s. I found the read() method on img object may cause blocking, so adding a timeout parameter to the urlopen() alone seems useless. But I found file object has no timeout version of read(). Any suggestions on this ? Thanks very much .
The urllib2.urlopen has a timeout parameter which is used for all blocking operations (connection buildup etc.)
This snippet is taken from one of my projects. I use a thread pool to download multiple files at once. It uses urllib.urlretrieve but the logic is the same. The url_and_path_list is a list of (url, path) tuples, the num_concurrent is the number of threads to be spawned, and the skip_existing skips downloading of files if they already exist in the filesystem.
def download_urls(url_and_path_list, num_concurrent, skip_existing):
# prepare the queue
queue = Queue.Queue()
for url_and_path in url_and_path_list:
# start the requested number of download threads to download the files
threads = []
for _ in range(num_concurrent):
t = DownloadThread(queue, skip_existing)
t.daemon = True
class DownloadThread(threading.Thread):
def __init__(self, queue, skip_existing):
super(DownloadThread, self).__init__()
self.queue = queue
self.skip_existing = skip_existing
def run(self):
while True:
#grabs url from queue
url, path = self.queue.get()
if self.skip_existing and exists(path):
# skip if requested
urllib.urlretrieve(url, path)
except IOError:
print "Error downloading url '%s'." % url
#signals to queue job is done
When you create tje connection with urllib2.urlopen(), you can give a timeout parameter.
As described in the doc :
The optional timeout parameter specifies a timeout in seconds for
blocking operations like the connection attempt (if not specified, the
global default timeout setting will be used). This actually only works
for HTTP, HTTPS and FTP connections.
With this you will be able to manage a maximum waiting duration and catch the exception raised.
The way I crawl a huge batch of documents is by having batch processor which crawls and dumps constant sized chunks.
Suppose you are to crawl a pre-known batch of say 100K documents. You can have some logic to generate constant size chunks of say 1000 documents which would be downloaded by a threadpool. Once the whole chunk is crawled, you can have bulk insert in your database. And then proceed with further 1000 documents and so on.
Advantages you get by following this approach:
You get the advantage of threadpool speeding up your crawl rate.
Its fault tolerant in the sense, you can continue from the chunk where it last failed.
You can have chunks generated on the basis of priority i.e. important documents to crawl first. So in case you are unable to complete the whole batch. Important documents are processed and less important documents can be picked up later on the next run.
An ugly hack that seems to work.
import os, socket, threading, errno
def timeout_http_body_read(response, timeout = 60):
def murha(resp):
# set a timer to yank the carpet underneath the blocking read() by closing the os file descriptor
t = threading.Timer(timeout, murha, (response,))
body = response.read()
except socket.error as se:
if se.errno == errno.EBADF: # murha happened
return (False, None)
return (True, body)
