I'm trying to apply a custom python function on every frame of a video, and create the video with modified frames as output. My input video is a mkv file, with variable framerate, and I'd like to get the same thing as output, so one frame in the input matches one in the output at the exact same time.
I tried to use this example of ffmpeg-python. However, it seems that the timestamp info are lost in the pipes. The output video has 689 frames when the input only has 300 (the durations also aren't a match, with 27s vs 11s for the input).
I also tried to first process each frame in my video and save the transformed version as PNGs. Then I "masked" the input video with the processed frames. This seems to be better because the output video has the same 11s duration than the input, but the frame count doesn't match (313 vs 300).
Code for the python-ffmpeg solution:
width = 1920
height = 1080
process1 = (
ffmpeg
.input(in_filename)
.output('pipe:', format='rawvideo', pix_fmt='rgb24')
.run_async(pipe_stdout=True)
)
process2 = (
ffmpeg
.input('pipe:', format='rawvideo', pix_fmt='rgb24', s='{}x{}'.format(width, height))
.output(out_filename, pix_fmt='yuv420p')
.overwrite_output()
.run_async(pipe_stdin=True)
)
while True:
in_bytes = process1.stdout.read(width * height * 3)
if not in_bytes:
break
in_frame = (
np
.frombuffer(in_bytes, np.uint8)
.reshape([height, width, 3])
)
# Just add 1 to the pixels for the example
out_frame = in_frame + 1
process2.stdin.write(
out_frame
.astype(np.uint8)
.tobytes()
)
process2.stdin.close()
process1.wait()
process2.wait()
Code for the overlay solution:
ffmpeg -i in.mkv -i test/%d.png -filter_complex "[0][1]overlay=0:0" -copyts out.mkv
Is there any other solution I didn't think about to perform what I'm trying to do? It doesn't seem to be that complicated but I can't find a way to do it.
Thanks for any help!
UPDATE:
Here are the logs for the input and output pipes of the python-ffmpeg solution.
Input
Input #0, matroska,webm, from 'in.mkv':
Metadata:
ENCODER : Lavf59.17.100
Duration: 00:00:11.48, start: 0.000000, bitrate: 45702 kb/s
Stream #0:0: Video: h264 (High 4:4:4 Predictive), yuvj420p(pc, gbr/unknown/unknown, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 60 fps, 60 tbr, 1k tbn (default)
Metadata:
ENCODER : Lavc58.134.100 h264_nvenc
DURATION : 00:00:11.483000000
Stream mapping:
Stream #0:0 -> #0:0 (h264 (native) -> rawvideo (native))
Output #0, rawvideo, to 'pipe:':
Metadata:
encoder : Lavf59.17.100
Stream #0:0: Video: rawvideo (RGB[24] / 0x18424752), rgb24(pc, gbr/unknown/unknown, progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 2985984 kb/s, 60 fps, 60 tbn (default)
Metadata:
DURATION : 00:00:11.483000000
encoder : Lavc59.20.100 rawvideo
frame= 689 fps=154 q=-0.0 Lsize= 4185675kB time=00:00:11.48 bitrate=2985984.1kbits/s dup=389 drop=0 speed=2.57x
Output
Input #0, rawvideo, from 'pipe:':
Duration: N/A, start: 0.000000, bitrate: 1244160 kb/s
Stream #0:0: Video: rawvideo (RGB[24] / 0x18424752), rgb24, 1920x1080, 1244160 kb/s, 25 tbr, 25 tbn
Stream mapping:
Stream #0:0 -> #0:0 (rawvideo (native) -> h264 (libx264))
[libx264 # 0000025afaf11140] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 AVX512
[libx264 # 0000025afaf11140] profile High, level 4.0, 4:2:0, 8-bit
[libx264 # 0000025afaf11140] 264 - core 164 r3081 19856cc - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=18 lookahead_threads=3 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, matroska, to 'images/videos/out.mkv':
Metadata:
encoder : Lavf59.17.100
Stream #0:0: Video: h264 (H264 / 0x34363248), yuv420p(tv, progressive), 1920x1080, q=2-31, 25 fps, 1k tbn
Metadata:
encoder : Lavc59.20.100 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
frame= 689 fps= 11 q=-0.0 Lsize= 4185675kB time=00:00:11.48 bitrate=2985984.1kbits/s dup=389 drop=0 speed=0.181x
video:4185675kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000000%
I'll answer my own question, as I've been able to solve the issue with the help of kesh in the comments.
There are basically two things:
vsync passthrough is required for the input video, to keep the number of frames
another external tool (MKVToolNix) has to be used twice to extract timestamps from the initial video and apply them to the output
Below is the relevant code to perform the whole operation using python and subprocess. You can use the following line on both input and output video to check that the timestamps are indeed the same for each frame: ffprobe -show_entries packet=pts_time,duration_time,stream_index video.mkv
width = 1920
height = 1080
process1 = (
ffmpeg
.input('in.mkv', vsync='passthrough')
.output('pipe:', format='rawvideo', pix_fmt='rgb24')
.run_async(pipe_stdout=True)
)
process2 = (
ffmpeg
.input('pipe:', format='rawvideo', pix_fmt='rgb24', s='{}x{}'.format(width, height))
.output('temp.mkv', pix_fmt='yuv420p')
.overwrite_output()
.run_async(pipe_stdin=True)
)
while True:
in_bytes = process1.stdout.read(width * height * 3)
if not in_bytes:
break
in_frame = (
np
.frombuffer(in_bytes, np.uint8)
.reshape([height, width, 3])
)
# Keep thing simple, just add 1 to each pixel
out_frame = in_frame + 1
process2.stdin.write(
out_np
.astype(np.uint8)
.tobytes()
)
process2.stdin.close()
process1.wait()
process2.wait()
# Extract timestamps from input video
subprocess.run(['mkvextract', 'in.mkv', 'timestamps_v2', '0:timestamps.txt'])
# Apply extracted timestamps to create synchronized output video
subprocess.run(['mkvmerge', '-o', 'out.mkv', '--timestamps', '0:timestamps.txt', 'temp.mkv'])
# Clean up
os.remove('temp.mkv')
os.remove('timestamps.txt')
Related
I worked on code published on GitHub https://github.com/jrterven/audio-visual-dataset/blob/master/extract_detailed_text_watson.py the code was design to use 5 words but I want to change it to 1 word so I try do that in the code but there is an error in the run as shown below
found 2 files
Processing video: health_news_2.mp4
video resolution: 608 x 1080
video framerate: 29.97002997002997
entry: <class 'dict'> {'link': 'build_Dataset', 'text': 'صورة', 'conf': 0.61, 'start': 2.07, 'end': 2.55, 'bounding_box': []}
s_sec, s_millisec: 2.0 69.99999999999984
entry: <class 'dict'> {'link': 'build_Dataset', 'text': 'مجهرية', 'conf': 0.97, 'start': 2.55, 'end': 3.24, 'bounding_box': []}
s_sec, s_millisec: 2.0 549.9999999999998
/Users/shaimaa/opt/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py:780: UserWarning: Note that order of the arguments: ceil_mode and return_indices will changeto match the args list in nn.MaxPool2d in a future release.
warnings.warn("Note that order of the arguments: ceil_mode and return_indices will change"
entry: <class 'dict'> {'link': 'build_Dataset', 'text': 'مجهرية', 'conf': 0.97, 'start': 2.55, 'end': 3.24, 'bounding_box': [230, 126, 131, 171]}
ffmpeg version 4.4.1 Copyright (c) 2000-2021 the FFmpeg developers
built with Apple clang version 13.0.0 (clang-1300.0.29.3)
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.4.1_3 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-avresample --enable-videotoolbox
libavutil 56. 70.100 / 56. 70.100
libavcodec 58.134.100 / 58.134.100
libavformat 58. 76.100 / 58. 76.100
libavdevice 58. 13.100 / 58. 13.100
libavfilter 7.110.100 / 7.110.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 9.100 / 5. 9.100
libswresample 3. 9.100 / 3. 9.100
libpostproc 55. 9.100 / 55. 9.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'build_Dataset/news/health_news_2.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.45.100
Duration: 00:04:50.88, start: 0.000000, bitrate: 603 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709), 608x1080 [SAR 1:1 DAR 76:135], 468 kb/s, 29.97 fps, 29.97 tbr, 30k tbn, 59.94 tbc (default)
Metadata:
handler_name : ISO Media file produced by Google Inc.
vendor_id : [0][0][0][0]
Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s (default)
Metadata:
handler_name : ISO Media file produced by Google Inc.
vendor_id : [0][0][0][0]
Stream mapping:
Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))
Stream #0:1 -> #0:1 (copy)
Press [q] to stop, [?] for help
[libx264 # 0x7f7fe8810800] using SAR=1/1
[libx264 # 0x7f7fe8810800] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 # 0x7f7fe8810800] profile High, level 1.1, 4:2:0, 8-bit
[libx264 # 0x7f7fe8810800] 264 - core 163 r3060 5db6aa6 - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=5 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to '/Users/shaimaa/Downloads/LIP_Reading/Code/audio-visual-dataset-master/results_news/news/health_news_2/1.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.76.100
Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 130x170 [SAR 1:1 DAR 13:17], q=2-31, 29.97 fps, 30k tbn (default)
Metadata:
handler_name : ISO Media file produced by Google Inc.
vendor_id : [0][0][0][0]
encoder : Lavc58.134.100 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s (default)
Metadata:
handler_name : ISO Media file produced by Google Inc.
vendor_id : [0][0][0][0]
frame= 21 fps=0.0 q=-1.0 Lsize= 18kB time=00:00:00.67 bitrate= 222.3kbits/s dup=1 drop=0 speed=4.86x
video:5kB audio:11kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 13.600434%
[libx264 # 0x7f7fe8810800] frame I:1 Avg QP:23.12 size: 1388
[libx264 # 0x7f7fe8810800] frame P:5 Avg QP:25.05 size: 401
[libx264 # 0x7f7fe8810800] frame B:15 Avg QP:30.82 size: 85
[libx264 # 0x7f7fe8810800] consecutive B-frames: 4.8% 0.0% 0.0% 95.2%
[libx264 # 0x7f7fe8810800] mb I I16..4: 3.0% 80.8% 16.2%
[libx264 # 0x7f7fe8810800] mb P I16..4: 1.4% 2.6% 0.4% P16..4: 51.3% 19.2% 6.3% 0.0% 0.0% skip:18.8%
[libx264 # 0x7f7fe8810800] mb B I16..4: 0.0% 0.1% 0.1% B16..8: 31.9% 2.7% 0.5% direct: 0.7% skip:64.0% L0:31.9% L1:64.7% BI: 3.4%
[libx264 # 0x7f7fe8810800] 8x8 transform intra:76.4% inter:84.5%
[libx264 # 0x7f7fe8810800] coded y,uvDC,uvAC intra: 75.0% 89.4% 42.3% inter: 7.0% 6.6% 0.3%
[libx264 # 0x7f7fe8810800] i16 v,h,dc,p: 0% 40% 10% 50%
[libx264 # 0x7f7fe8810800] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 15% 32% 8% 5% 7% 8% 7% 10% 9%
[libx264 # 0x7f7fe8810800] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 21% 26% 9% 7% 7% 10% 4% 10% 6%
[libx264 # 0x7f7fe8810800] i8c dc,h,v,p: 38% 34% 20% 8%
[libx264 # 0x7f7fe8810800] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 # 0x7f7fe8810800] ref P L0: 68.6% 9.7% 15.4% 6.2%
[libx264 # 0x7f7fe8810800] ref B L0: 91.5% 5.4% 3.1%
[libx264 # 0x7f7fe8810800] ref B L1: 89.9% 10.1%
[libx264 # 0x7f7fe8810800] kb/s:53.25
entry: <class 'dict'> {'link': 'build_Dataset', 'text': 'تظهر', 'conf': 1.0, 'start': 3.24, 'end': 3.66, 'bounding_box': []}
s_sec, s_millisec: 3.0 240.00000000000023
entry: <class 'dict'> {}
Traceback (most recent call last):
File "extract_subvideos.py", line 424, in <module>
main(args)
File "extract_subvideos.py", line 140, in main
s_sec, s_millisec = divmod(float(entry['start']), 1)
KeyError: 'start'
In this part of the code I change in it an only the number of word
fa = FaceAlignment()
videos_directory = args.videos_dir
results_dir = args.results_dir
vids_name = args.category
vid_proc_name = args.log_file
dataset_annotation_file = args.ann_file
if args.save_videos == 'True':
save_videos = True
else:
save_videos = False
# Create video window
cv2.namedWindow('Vid')
# load or create list with processed files
processed_files = []
videos_processed_exists = os.path.isfile(
os.path.join(results_dir, vid_proc_name))
if not videos_processed_exists:
with open(os.path.join(results_dir, vid_proc_name), "w") as fp:
for pfiles in processed_files:
print(pfiles, file=fp)
else:
with open(os.path.join(results_dir, vid_proc_name)) as fp:
processed_files = fp.read().splitlines()
# Create annotation file the first time
annotation_exists = os.path.isfile(os.path.join(
results_dir, dataset_annotation_file))
if not annotation_exists:
try:
with open(os.path.join(
results_dir, dataset_annotation_file), 'w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=csv_columns)
writer.writeheader()
except IOError:
print("Error creating annotaton file. I/O error")
# Get json files list names in videos directory
files_list = []
for ann_file in os.listdir(os.path.join(videos_directory, vids_name)):
if ann_file.endswith(".json"):
files_list.append(ann_file[0:-5])
files_list = natsorted(files_list)
num_files = len(files_list)
print('found', num_files, 'files')
# traverse all the files
stop_videos = False
for file in files_list:
if stop_videos:
break
# check if current video is not in alredy processed
if file in processed_files:
print(file, 'has already been processed. Skipping it.')
continue
num_output_video = 0
# Search for the video files in videos_directory
video_name = file + '.mp4'
print('Processing video:', video_name)
if save_videos:
# create output directory
output_dir = os.path.join(results_dir, vids_name, file)
if not os.path.isdir(output_dir):
os.mkdir(output_dir)
# Load watson results
with open(os.path.join(
videos_directory, vids_name, file + '.json')) as f:
stt_results = json.load(f)
# Extract all the words with confidence >90
words_data = extract_words_from_watson_results(stt_results, max_words=5)
# Start the video capture
cap = cv2.VideoCapture(os.path.join(
videos_directory, vids_name, video_name))
# Extract video metadata
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
fps = cap.get(cv2.CAP_PROP_FPS)
print('video resolution:', width, ' x ', height)
print('video framerate:', fps)
frame_count = 0
fps_processing = 30.0 # fps holder
t = cv2.getTickCount() # initiate the tickCounter
count = 0
for entry in words_data:
# Extract speech to text data
print('entry:', type(entry), entry)
s_sec, s_millisec = divmod(float(entry['start']), 1)
e_sec, e_millisec = divmod(float(entry['end']), 1)
s_min = 0
e_min = 0
s_millisec = s_millisec * 1000
e_millisec = e_millisec * 1000
print('s_sec, s_millisec:', s_sec, s_millisec)
if s_sec >= 60:
s_min = math.floor(s_sec / 60.0)
s_sec = s_sec % 60
if e_sec >= 60:
e_min = math.floor(e_sec / 60.0)
e_sec = e_sec % 60
# Determine video frames involved in stt entry
min_frame = s_min*fps*60 + (s_sec*fps)
max_frame = e_min*fps*60 + (e_sec*fps)
# go to min_frame
cap.set(cv2.CAP_PROP_POS_FRAMES, min_frame)
frame_count = min_frame
# read frames from min_frame to max_frame
num_people = 0
valid_video = True
landmarks = []
angles = []
consecutive_frames_no_people = 0
while frame_count < max_frame:
if count == 0:
t = cv2.getTickCount()
# capture next frame
ret, frame = cap.read()
if not ret:
continue
frame_count += 1
# resize frame for faster processing
if frame.shape[0] <= 0 or frame.shape[1] <= 0:
continue
frame_small = cv2.resize(frame, (0, 0), fx=scale, fy=scale,
interpolation=cv2.INTER_LINEAR)
# detect faces and landmarjs
fa.update_features(frame_small)
landmarks.append(fa.get_mouth_features(scale=scale))
num_people = fa.get_num_people()
angles.append(fa.get_yaw())
# if it detects less than or more than 1 person
# go to next subtitle
if num_people != 1:
consecutive_frames_no_people += 1
if consecutive_frames_no_people >= max_bad_frames:
print(consecutive_frames_no_people,
' frames without 1 person. Skiping to next subtitle')
valid_video = False
break
# if only one person in the scene
if num_people == 1:
consecutive_frames_no_people = 0
fa.renderMouth(frame_small)
# Put fps at which we are processing camera feed on frame
cv2.putText(frame_small, "{0:.2f}-fps".format(fps_processing),
(50, height-50), cv2.FONT_HERSHEY_COMPLEX,
1, (0, 0, 255), 2)
# Display the image
cv2.imshow('Vid',frame_small)
# Read keyboard and exit if ESC was pressed
k = cv2.waitKey(1) & 0xFF
if k ==27:
exit()
elif k == ord('q'):
stop_videos = True
# increment frame counter
count = count + 1
# calculate fps at an interval of 100 frames
if (count == 30):
t = (cv2.getTickCount() - t)/cv2.getTickFrequency()
fps_processing = 30.0/t
count = 0
# if this was a valid video
if valid_video and len(landmarks) > 0:
num_output_video += 1
entry['mouth3d'] = landmarks
entry['angle'] = angles
if save_videos:
s_hr = 0
e_hr = 0
if s_min >= 60:
s_hr = math.floor(s_min / 60)
s_min = s_min % 60
if e_min >= 60:
e_hr = math.floor(e_min / 60)
e_min = e_min % 60
# cut and crop video
# ffmpeg -i input.mp4 -ss hh:mm:ss -filter:v crop=w:h:x:y -c:a copy -to hh:mm:ss output.mp4
ss = "{0:02d}:{1:02d}:{2:02d}.{3:03d}".format(
s_hr, s_min, int(s_sec), math.ceil(s_millisec))
es = "{0:02d}:{1:02d}:{2:02d}.{3:03d}".format(
e_hr, e_min, int(e_sec), math.ceil(e_millisec))
crop = "crop={0:1d}:{1:1d}:{2:1d}:{3:1d}".format(
bbw, bbh, bbx1, bby1)
out_name = os.path.join(output_dir, str(num_output_video))
subprocess.call(['ffmpeg', #'-hide_banner', '-loglevel', 'panic',
'-i', os.path.join(
videos_directory, vids_name, video_name),
'-ss', ss,
'-filter:v', crop, '-c:a', 'copy',
'-to', es, out_name +'.mp4'])
# save recognized speech
text_file = open(out_name +'.txt', "w")
text_file.write(entry['text'] + '\n')
text_file.write(str(entry['conf']))
text_file.close()
# append results to annotation file
append_annotation_file(os.path.join(
results_dir, dataset_annotation_file), words_data)
# save name of processed file
processed_files.append(file)
with open(os.path.join(results_dir, vid_proc_name), "w") as fp:
for p_file in processed_files:
print(p_file, file=fp)
# Release resources
cap.release()
cv2.destroyAllWindows()
def extract_text_conf_ts(s_idx, max_words, num_words, timestamps, conf, link):
text = ''
avg_conf = 0
start = timestamps[int(s_idx * max_words)][1]
end = timestamps[int(s_idx * max_words + num_words-1)][2]
for w_idx in range(num_words):
text = text + ' ' + timestamps[int(s_idx*max_words + w_idx)][0]
avg_conf += conf[int(s_idx*max_words + w_idx)][1]
avg_conf = round(avg_conf/num_words, 2)
if len(text.strip()) >= 4:
out_entry = {'link': link, 'text': text.strip(), 'conf': avg_conf,
'start':start, 'end': end, 'mouth3d': [],
'angle': [] }
else:
out_entry = {}
return out_entry
def extract_words_from_watson_results(stt_results, max_words=5):
data = stt_results['results']
link = stt_results['link']
link = link.rsplit('/', 1)[-1]
out_data = []
for sentence_idx, ann in enumerate(data):
data_ann = ann['alternatives'][0]
text = data_ann['transcript']
conf = data_ann['word_confidence']
timestamps = data_ann['timestamps']
num_words = len(timestamps)
num_splits = num_words//max_words
rest = num_words%max_words
if num_words < max_words:
maxx_words = num_words
else:
maxx_words = max_words
for s_idx in range(num_splits):
out_entry = extract_text_conf_ts(s_idx, maxx_words, maxx_words,
timestamps, conf, link)
out_data.append(out_entry)
if rest > 0:
out_entry = extract_text_conf_ts(num_splits, maxx_words, rest,
timestamps, conf, link)
if out_entry:
out_data.append(out_entry)
return out_data
def append_annotation_file(csv_file, data):
try:
with open(csv_file, 'a') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=csv_columns)
for entry in data:
writer.writerow(entry)
except IOError:
print("I/O error")
def remove_accents(input_str):
nfkd_form = unicodedata.normalize('NFKD', input_str)
only_ascii = nfkd_form.encode('ASCII', 'ignore')
return only_ascii
if __name__== "__main__":
# Parse input arguments
parser = argparse.ArgumentParser(description='Extract subvideos')
parser.add_argument('--dir', dest='videos_dir',
help='Directory with videos', type=str)
parser.add_argument('--cat', dest='category',
help='Video category', type=str)
parser.add_argument('--vids_log', dest='log_file',
help='Name of log file', type=str)
parser.add_argument('--results_dir', dest='results_dir',
help='Directory with results', type=str)
parser.add_argument('--ann_file', dest='ann_file',
help='Annotations file (csv)', type=str)
parser.add_argument('--save_videos', dest='save_videos',
help='Save videos', type=str, default='False')
args = parser.parse_args()
main(args)
I'm trying to get youtube video by using youtube_dl and everything is working fine except for the fact that I only get the audio.
from youtube_dl import YoutubeDL
link = "SOME_YOUTUBE_VIDEO" # as it was only a video
with YoutubeDL({}) as ydl:
info = ydl.extract_info(link, download=False)
url = info['formats'][0]['url']
title = info["title"]
print(url, title)
from youtube_dl import YoutubeDL
link = "https://www.youtube.com/watch?v=Y9wBC3H4iH4"
with YoutubeDL({}) as ydl:
info = ydl.extract_info(link, download=False)
for i, format in enumerate(info['formats']):
print(f"[{i}] {format['format']}")
output:
[youtube] Y9wBC3H4iH4: Downloading webpage
[0] 249 - audio only (tiny)
[1] 250 - audio only (tiny)
[2] 251 - audio only (tiny)
[3] 140 - audio only (tiny)
[4] 160 - 256x144 (144p)
[5] 278 - 256x144 (144p)
[6] 242 - 426x240 (240p)
[7] 133 - 426x240 (240p)
[8] 243 - 640x360 (360p)
[9] 134 - 640x360 (360p)
[10] 244 - 854x480 (480p)
[11] 135 - 854x480 (480p)
[12] 247 - 1280x720 (720p)
[13] 136 - 1280x720 (720p)
[14] 248 - 1920x1080 (1080p)
[15] 137 - 1920x1080 (1080p)
[16] 18 - 640x360 (360p)
[17] 22 - 1280x720 (720p)
It literally says audio only for some of the formats! Select one of the non-audio-only formats, and you won't get audio only. Note that which formats are available very much depends on which video you're trying to download.
Bluez Question:
I have a Raspberry Pi Z W, an 8bitdo SF30 bluetooth controller and a robot my employer mass produces. I am looking to control the robot from the RPI through the remote.
I have the RPi connecting to the controller using bluetoothctl, and I can see the messages from the controller through btmon (bluetooth monitor).
It becomes obvious that the four "80" values below correspond to the two joysticks at their mid points.
> ACL Data RX: Handle 11 flags 0x02 dlen 15 #30 [hci0] 2.555574
Channel: 65 len 11 [PSM 0 mode Basic (0x00)] {chan 65535}
a1 03 0f 80 80 12 31 00 00 00 00 ......1....
> ACL Data RX: Handle 11 flags 0x02 dlen 15 #31 [hci0] 2.587293
Channel: 65 len 11 [PSM 0 mode Basic (0x00)] {chan 65535}
a1 03 0f 80 80 27 4b 00 00 00 00 .....'K....
> ACL Data RX: Handle 11 flags 0x02 dlen 15 #32 [hci0] 2.613543
Channel: 65 len 11 [PSM 0 mode Basic (0x00)] {chan 65535}
a1 03 0f 80 80 61 7b 00 00 00 00 .....a{....
> ACL Data RX: Handle 11 flags 0x02 dlen 15 #33 [hci0] 2.615552
Channel: 65 len 11 [PSM 0 mode Basic (0x00)] {chan 65535}
a1 03 0f 80 80 80 80 00 00 00 00 ...........
> ACL Data RX: Handle 11 flags 0x02 dlen 15 #34 [hci0] 74.653567
Channel: 65 len 11 [PSM 0 mode Basic (0x00)] {chan 65535}
a1 03 0f 80 80 80 80 00 00 00 08 ...........
I have also been reading in data from the two resulting /dev/input/ files (/dev/input/js0 and /dev/input/event0) in python using format LLHHQ.
I thought that the same data from btmon (which is easy to interpret) would be represented in the Q section of that format (the last number below).
(470898350, 155732, 22190, 7185, 16919435)
(470898350, 160124, 22190, 7185, 16916057)
(470898380, 162488, 22220, 7185, 163502)
(470898380, 16915382, 22260, 7185, 16910652)
(470898420, 16908288, 22290, 7185, 161137)
(470898450, 16971797, 22300, 7185, 155732)
(470898460, 16966392, 22330, 7185, 154043)
(470898490, 16966054, 22340, 7185, 147287)
(470898500, 16967405, 22340, 7185, 131072)
(470898500, 16908288, 22740, 7185, 151060481)
(470899070, 151060480, 22970, 7185, 134283265)
(470899320, 134283264, 23200, 7185, 117506049)
(470899550, 117506048, 23420, 7185, 100728833)
(470899750, 100728832, 23590, 7185, 117506049)
(470899910, 117506048, 23930, 7185, 134283265)
(470900310, 134283264, 25110, 7185, 100728833)
(470901380, 117506049, 25250, 7185, 134283265)
(470901490, 100728832, 25390, 7185, 117506048)
(470901710, 134283264, 25580, 7185, 100728833)
(470901750, 117506049, 25720, 7185, 117506048)
(470901940, 134283265, 25810, 7185, 100728832)
(470902160, 100728833, 26070, 7185, 134283264)
(470902400, 100728832, 26690, 7185, 134283265)
(470903070, 134283264, 27130, 7185, 151060481)
(470903430, 151060480, 27360, 7185, 100728833)
However these outputs don't appear to correspond when read into binary, for example the two joysticks seem to change the same bits.
The basic of my question is how do I get the same data which is read in the btmon in a bluez-based code.
This is the python code I am using at the moment.
f = open( "/dev/input/js0", "rb" ); # Open the file in the read-binary mode
EVENT_SIZE = struct.calcsize("LLHHQ")
while 1:
data = f.read(EVENT_SIZE)
unpacked_data = struct.unpack('llHHQ',data)
# print("Length:" + str(len(unpacked_data)))
# print(unpacked_data)
remote_data = unpacked_data[4]
print(format(remote_data, '064b'))
It might be helpful to use a library like evdev as it will do much of the heavy lifting for you.
Example of using this might be:
from time import sleep
from pydbus import SystemBus
import evdev
# For testing
# python3 -m evdev.evtest
class Controller:
def __init__(self, adapter_int=0):
adapter_path = '/org/bluez/hci{}'.format(adapter_int)
self.dbus = SystemBus()
self.adapter = self.dbus.get('org.bluez', adapter_path)
# Use bluetoothctl to find out what the path is for your controller
self.controller = self.dbus.get('org.bluez', '/org/bluez/hci0/dev_DC_0C_2D_20_DA_E8')
print('Waiting for connection from DC:0C:2D:20:DA:E8')
# self.controller.Discoverable = True
while not self.controller.Connected:
sleep(1)
print('Connected')
sleep(6)
# https://python-evdev.readthedocs.io/en/latest/tutorial.html to get path of your controller
self.device = evdev.InputDevice('/dev/input/event2')
self.max_value = 0
self.min_value = 255
self.max_throttle = 1
self.min_throttle = -1
self.right_steering = 1
self.left_steering = -1
def map_throttle(self, value):
input_range = self.max_value - self.min_value
output_range = self.max_throttle - self.min_throttle
input_percentage = (value - self.min_value) / input_range
output_value = (output_range * input_percentage) + self.min_throttle
return round(output_value, 2)
def map_steering(self, value):
input_range = self.max_value - self.min_value
output_range = self.right_steering - self.left_steering
input_percentage = (value - self.min_value) / input_range
output_value = (output_range * input_percentage) + self.left_steering
return round(output_value, 2)
def get_events(self):
for event in self.device.read_loop():
ly = None
rx = None
btn = None
if event.type == evdev.ecodes.EV_ABS:
if event.code == 1:
# print('Left:', event.value)
ly = self.map_throttle(event.value)
if event.code == 3:
# print('Right:', event.value)
rx = self.map_steering(event.value)
if event.type == evdev.ecodes.EV_KEY:
if event.code == evdev.ecodes.BTN_SOUTH and event.value == 0:
btn = 'BTN_SOUTH'
elif event.code == evdev.ecodes.BTN_WEST and event.value == 0:
btn = 'BTN_WEST'
elif event.code == evdev.ecodes.BTN_NORTH and event.value == 0:
btn = 'BTN_NORTH'
elif event.code == evdev.ecodes.BTN_EAST and event.value == 0:
btn = 'BTN_EAST'
yield ly, rx, btn
if __name__ == '__main__':
ctrl = Controller()
for speed, steer, action in ctrl.get_events():
print('Speed: {}, Steer: {}, Button: {}'.format(speed, steer, action))
If you wanted to go a little higher level, then a library like https://github.com/ApproxEng/approxeng.input is a popular one.
Here is the original command line call that works at shell:
ffmpeg -i /Users/abc/Desktop/Test/Full_Mov.mov -vf "drawtext=fontfile=/System/Library/Fonts/Keyboard.ttf: text='SCENE BLAH BLAH - %{frame_num}': start_number=1: x=(w-tw)/2: y=h-(2*lh): fontcolor=white: fontsize=20: box=1: boxcolor=black: boxborderw=5","format=yuv420p" -f segment -segment_frames 123 -reset_timestamps 1 -c:a copy -map 0 "/Users/abc/Desktop/Test/%03d_test40.mov"
Getting a negative value when trying to run ffmpeg via subprocess, causing it to fail and not process. I've even tried dumbing down the call to just run ffmpeg by itself with no arguments, and it still returns a negative value.
import subprocess
ffmpeg = "/usr/local/bin/ffmpeg"
source = "/Users/abc/Desktop/Test/Full_Mov.mov"
destination = "/Users/abc/Desktop/Test/%03d_test40.mov"
cmd = "%s -i %s -vf \"drawtext=fontfile=/System/Library/Fonts/Keyboard.ttf: text='SCENE BLAH BLAH - %%{frame_num}': start_number=1: x=(w-tw)/2: y=h-(2*lh): fontcolor=black: fontsize=20: box=1: boxcolor=white: boxborderw=5\",\"format=yuv420p\" -f segment -segment_frames 123 -reset_timestamps 1 -c:a copy -map 0 \"%s\"" % (ffmpeg, source, destination)
log("ffmpeg cmd: %s" % cmd)
log(subprocess.check_output(cmd, shell=True))
When running full command:
CalledProcessError: Command '/usr/local/bin/ffmpeg -i /Users/abc/Desktop/Full_Mov.mov -vf "drawtext=fontfile=/System/Library/Fonts/Keyboard.ttf: text='SCENE BLAH BLAH - %{frame_num}': start_number=1: x=(w-tw)/2: y=h-(2*lh): fontcolor=black: fontsize=20: box=1: boxcolor=white: boxborderw=5","format=yuv420p" -f segment -segment_frames 318 -reset_timestamps 1 -c:a copy -map 0 "/Users/abc/Desktop/Test/%03d.tmp.mov"' returned non-zero exit status -8
raise CalledProcessError(retcode, cmd, output=output)
When running just ffmpeg no arguments:
CalledProcessError: Command '/usr/local/bin/ffmpeg' returned non-zero exit status 1
raise CalledProcessError(retcode, cmd, output=output)
Latest output with suggested changes below:
subprocess.CalledProcessError: Command '['/usr/local/bin/ffmpeg', '-i', '/Users/szaharak/Desktop/Flix_Test/ep888_sq66_main_mov_2019_05_20_14_15.mov', '-vf', 'drawtext=fontfile=/System/Library/Fonts/Keyboard.ttf:', 'text=SCENE BLAH BLAH - %%{frame_num}:', 'start_number=1:', 'x=(w-tw)/2:', 'y=h-(2*lh):', 'fontcolor=black:', 'fontsize=20:', 'box=1:', 'boxcolor=white:', 'boxborderw=5', 'format=yuv420p', '-f', 'segment', '-segment_frames', '123', '-reset_timestamps', '1', '-c:a', 'copy', '-map', '0', '/Users/szaharak/Desktop/Flix_Test/%03d_test40.mov']' returned non-zero exit status 1
[NULL # 0x7f88c3012200] Unable to find a suitable output format for 'text=SCENE BLAH BLAH - %%{frame_num}:'
text=SCENE BLAH BLAH - %%{frame_num}:: Invalid argument
And here is latest...
>>> rc = subprocess.check_call(cmd)
ffmpeg version N-93891-ge1839283bc-tessus https://evermeet.cx/ffmpeg/ Copyright (c) 2000-2019 the FFmpeg developers
built with Apple LLVM version 10.0.1 (clang-1001.0.46.4)
configuration: --cc=/usr/bin/clang --prefix=/opt/ffmpeg --extra-version=tessus --enable-avisynth --enable-fontconfig --enable-gpl --enable-libaom --enable-libass --enable-libbluray --enable-libdav1d --enable-libfreetype --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libmysofa --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopus --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-version3 --pkg-config-flags=--static --disable-ffplay
libavutil 56. 28.100 / 56. 28.100
libavcodec 58. 52.102 / 58. 52.102
libavformat 58. 27.103 / 58. 27.103
libavdevice 58. 7.100 / 58. 7.100
libavfilter 7. 53.101 / 7. 53.101
libswscale 5. 4.101 / 5. 4.101
libswresample 3. 4.100 / 3. 4.100
libpostproc 55. 4.100 / 55. 4.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/Users/szaharak/Desktop/Flix_Test/ep888_sq66_main_mov_2019_05_20_14_15.mov':
Metadata:
major_brand : qt
minor_version : 537199360
compatible_brands: qt
creation_time : 2019-05-14T17:58:24.000000Z
com.apple.quicktime.player.movie.audio.gain: 1.000000
com.apple.quicktime.player.movie.audio.treble: 0.000000
com.apple.quicktime.player.movie.audio.bass: 0.000000
com.apple.quicktime.player.movie.audio.balance: 0.000000
com.apple.quicktime.player.movie.audio.pitchshift: 0.000000
com.apple.quicktime.player.movie.audio.mute:
com.apple.quicktime.player.movie.visual.brightness: 0.000000
com.apple.quicktime.player.movie.visual.color: 1.000000
com.apple.quicktime.player.movie.visual.tint: 0.000000
com.apple.quicktime.player.movie.visual.contrast: 1.000000
com.apple.quicktime.player.version: 7.6.6 (7.6.6)
com.apple.quicktime.version: 7.7.3 (2943.14) 0x7738000 (Mac OS X, 10.11.6, 15G22010)
Duration: 00:01:12.67, start: 0.000000, bitrate: 23379 kb/s
Stream #0:0(eng): Video: mjpeg (Baseline) (jpeg / 0x6765706A), yuvj422p(pc, bt470bg/unknown/unknown), 1280x720 [SAR 72:72 DAR 16:9], 21838 kb/s, 24 fps, 24 tbr, 24k tbn, 24k tbc (default)
Metadata:
creation_time : 2019-05-14T17:58:24.000000Z
handler_name : Apple Video Media Handler
encoder : Photo - JPEG
Stream #0:1(eng): Audio: pcm_s16be (twos / 0x736F7774), 48000 Hz, stereo, s16, 1536 kb/s (default)
Metadata:
creation_time : 2019-05-14T17:58:24.000000Z
handler_name : Apple Sound Media Handler
[NULL # 0x7f8ddc8ce200] Unable to find a suitable output format for 'text=SCENE BLAH BLAH - %{frame_num}: start_number=1: x=(w-tw)/2: y=h-(2*lh): fontcolor=black: fontsize=20: box=1: boxcolor=white: boxborderw=5: format=yuv420p'
text=SCENE BLAH BLAH - %{frame_num}: start_number=1: x=(w-tw)/2: y=h-(2*lh): fontcolor=black: fontsize=20: box=1: boxcolor=white: boxborderw=5: format=yuv420p: Invalid argument
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python#2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 190, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/local/bin/ffmpeg', '-i', '/Users/szaharak/Desktop/Flix_Test/ep888_sq66_main_mov_2019_05_20_14_15.mov', '-vf', 'drawtext=fontfile=/System/Library/Fonts/Keyboard.ttf:', 'text=SCENE BLAH BLAH - %{frame_num}: start_number=1: x=(w-tw)/2: y=h-(2*lh): fontcolor=black: fontsize=20: box=1: boxcolor=white: boxborderw=5: format=yuv420p', '-f', 'segment', '-segment_frames', '123', '-reset_timestamps', '1', '-c:a', 'copy', '-map', '0', '/Users/szaharak/Desktop/Flix_Test/%03d_test40.mov']' returned non-zero exit status 1
ffmpeg command line is tricky as there are a lot of special arguments and mistakes are sometimes difficult to understand, the program sometimes interprets/discards wrong arguments, resulting in cryptic error messages.
Another difficulty is using check_output. You don't really need to do this, and if the program fails you won't have any output at all.
I would try the following (that I could not test), which is to:
drop shell=True
pass a list of arguments instead of composing the string yourself. It allows to forget about the quoting/escaping. You don't need to format the command, source & destination since they're standalone arguments:
Don't log, just print with check_call.
As a bonus, you'll get the output in real-time, instead of in the end...
note that I have removed the single quotes in "text=SCENE BLAH BLAH - %{frame_num}:, too and unescaped the % char.
like this:
cmd = [ffmpeg,"-i",source,"-vf",
"drawtext=fontfile=/System/Library/Fonts/Keyboard.ttf:",
"text=SCENE BLAH BLAH - %{frame_num}: start_number=1: x=(w-tw)/2: y=h-(2*lh): fontcolor=black: fontsize=20: box=1: boxcolor=white: boxborderw=5: format=yuv420p",
"-f","segment","-segment_frames","123",
"-reset_timestamps","1","-c:a","copy","-map","0",destination]
log("ffmpeg cmd: %s" % cmd)
rc = subprocess.check_call(cmd)
if rc:
raise Exception("ffmpeg failed")
for the case of ffmpeg, you could migrate to ffmpeg python module. Could save some argument parsing trouble too.
I have a workstation with these specifications:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-1660 v4 # 3.20GHz
Stepping: 1
CPU MHz: 1200.049
CPU max MHz: 3800.0000
CPU min MHz: 1200.0000
BogoMIPS: 6400.08
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K
NUMA node0 CPU(s): 0-15
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts flush_l1d
I have implemented dask to distribute some calculations and I am setting a Client() this way:
if __name__ == '__main__':
cluster = LocalCluster()
client = Client(cluster, asyncronous=True, n_workers=8,
threads_per_worker=2)
train()
It definitely seems that dask is using all resources when I call my delayed functions with dask.compute(*computations, scheduler='distributed'). The dashboard looks like:
Now, if I go ahead and change my Client() to:
if __name__ == '__main__':
cluster = LocalCluster()
client = Client(cluster, asyncronous=True, n_workers=4,
threads_per_worker=2)
train()
I would expect to be using the half of my resources, but as you can see on my dashboard that is not the case.
Why dask Client() is still using all resources? I would appreciate any input on this.
The Client class will make a cluster for you in the case that you haven't already specified one. Thos keywords only have an effect when not passing an existing cluster instance. You should instead put them into your call to LocalCluster:
cluster = LocalCluster(n_workers=4, threads_per_worker=2)
client = Client(cluster, asynchronous=True)
or you can simply skip making the cluster
client = Client(asynchronous=True, n_workers=4, threads_per_worker=2)