Related
I worked on code published on GitHub https://github.com/jrterven/audio-visual-dataset/blob/master/extract_detailed_text_watson.py the code was design to use 5 words but I want to change it to 1 word so I try do that in the code but there is an error in the run as shown below
found 2 files
Processing video: health_news_2.mp4
video resolution: 608 x 1080
video framerate: 29.97002997002997
entry: <class 'dict'> {'link': 'build_Dataset', 'text': 'صورة', 'conf': 0.61, 'start': 2.07, 'end': 2.55, 'bounding_box': []}
s_sec, s_millisec: 2.0 69.99999999999984
entry: <class 'dict'> {'link': 'build_Dataset', 'text': 'مجهرية', 'conf': 0.97, 'start': 2.55, 'end': 3.24, 'bounding_box': []}
s_sec, s_millisec: 2.0 549.9999999999998
/Users/shaimaa/opt/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py:780: UserWarning: Note that order of the arguments: ceil_mode and return_indices will changeto match the args list in nn.MaxPool2d in a future release.
warnings.warn("Note that order of the arguments: ceil_mode and return_indices will change"
entry: <class 'dict'> {'link': 'build_Dataset', 'text': 'مجهرية', 'conf': 0.97, 'start': 2.55, 'end': 3.24, 'bounding_box': [230, 126, 131, 171]}
ffmpeg version 4.4.1 Copyright (c) 2000-2021 the FFmpeg developers
built with Apple clang version 13.0.0 (clang-1300.0.29.3)
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.4.1_3 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-avresample --enable-videotoolbox
libavutil 56. 70.100 / 56. 70.100
libavcodec 58.134.100 / 58.134.100
libavformat 58. 76.100 / 58. 76.100
libavdevice 58. 13.100 / 58. 13.100
libavfilter 7.110.100 / 7.110.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 9.100 / 5. 9.100
libswresample 3. 9.100 / 3. 9.100
libpostproc 55. 9.100 / 55. 9.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'build_Dataset/news/health_news_2.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.45.100
Duration: 00:04:50.88, start: 0.000000, bitrate: 603 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709), 608x1080 [SAR 1:1 DAR 76:135], 468 kb/s, 29.97 fps, 29.97 tbr, 30k tbn, 59.94 tbc (default)
Metadata:
handler_name : ISO Media file produced by Google Inc.
vendor_id : [0][0][0][0]
Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s (default)
Metadata:
handler_name : ISO Media file produced by Google Inc.
vendor_id : [0][0][0][0]
Stream mapping:
Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))
Stream #0:1 -> #0:1 (copy)
Press [q] to stop, [?] for help
[libx264 # 0x7f7fe8810800] using SAR=1/1
[libx264 # 0x7f7fe8810800] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 # 0x7f7fe8810800] profile High, level 1.1, 4:2:0, 8-bit
[libx264 # 0x7f7fe8810800] 264 - core 163 r3060 5db6aa6 - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=5 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to '/Users/shaimaa/Downloads/LIP_Reading/Code/audio-visual-dataset-master/results_news/news/health_news_2/1.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.76.100
Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 130x170 [SAR 1:1 DAR 13:17], q=2-31, 29.97 fps, 30k tbn (default)
Metadata:
handler_name : ISO Media file produced by Google Inc.
vendor_id : [0][0][0][0]
encoder : Lavc58.134.100 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s (default)
Metadata:
handler_name : ISO Media file produced by Google Inc.
vendor_id : [0][0][0][0]
frame= 21 fps=0.0 q=-1.0 Lsize= 18kB time=00:00:00.67 bitrate= 222.3kbits/s dup=1 drop=0 speed=4.86x
video:5kB audio:11kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 13.600434%
[libx264 # 0x7f7fe8810800] frame I:1 Avg QP:23.12 size: 1388
[libx264 # 0x7f7fe8810800] frame P:5 Avg QP:25.05 size: 401
[libx264 # 0x7f7fe8810800] frame B:15 Avg QP:30.82 size: 85
[libx264 # 0x7f7fe8810800] consecutive B-frames: 4.8% 0.0% 0.0% 95.2%
[libx264 # 0x7f7fe8810800] mb I I16..4: 3.0% 80.8% 16.2%
[libx264 # 0x7f7fe8810800] mb P I16..4: 1.4% 2.6% 0.4% P16..4: 51.3% 19.2% 6.3% 0.0% 0.0% skip:18.8%
[libx264 # 0x7f7fe8810800] mb B I16..4: 0.0% 0.1% 0.1% B16..8: 31.9% 2.7% 0.5% direct: 0.7% skip:64.0% L0:31.9% L1:64.7% BI: 3.4%
[libx264 # 0x7f7fe8810800] 8x8 transform intra:76.4% inter:84.5%
[libx264 # 0x7f7fe8810800] coded y,uvDC,uvAC intra: 75.0% 89.4% 42.3% inter: 7.0% 6.6% 0.3%
[libx264 # 0x7f7fe8810800] i16 v,h,dc,p: 0% 40% 10% 50%
[libx264 # 0x7f7fe8810800] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 15% 32% 8% 5% 7% 8% 7% 10% 9%
[libx264 # 0x7f7fe8810800] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 21% 26% 9% 7% 7% 10% 4% 10% 6%
[libx264 # 0x7f7fe8810800] i8c dc,h,v,p: 38% 34% 20% 8%
[libx264 # 0x7f7fe8810800] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 # 0x7f7fe8810800] ref P L0: 68.6% 9.7% 15.4% 6.2%
[libx264 # 0x7f7fe8810800] ref B L0: 91.5% 5.4% 3.1%
[libx264 # 0x7f7fe8810800] ref B L1: 89.9% 10.1%
[libx264 # 0x7f7fe8810800] kb/s:53.25
entry: <class 'dict'> {'link': 'build_Dataset', 'text': 'تظهر', 'conf': 1.0, 'start': 3.24, 'end': 3.66, 'bounding_box': []}
s_sec, s_millisec: 3.0 240.00000000000023
entry: <class 'dict'> {}
Traceback (most recent call last):
File "extract_subvideos.py", line 424, in <module>
main(args)
File "extract_subvideos.py", line 140, in main
s_sec, s_millisec = divmod(float(entry['start']), 1)
KeyError: 'start'
In this part of the code I change in it an only the number of word
fa = FaceAlignment()
videos_directory = args.videos_dir
results_dir = args.results_dir
vids_name = args.category
vid_proc_name = args.log_file
dataset_annotation_file = args.ann_file
if args.save_videos == 'True':
save_videos = True
else:
save_videos = False
# Create video window
cv2.namedWindow('Vid')
# load or create list with processed files
processed_files = []
videos_processed_exists = os.path.isfile(
os.path.join(results_dir, vid_proc_name))
if not videos_processed_exists:
with open(os.path.join(results_dir, vid_proc_name), "w") as fp:
for pfiles in processed_files:
print(pfiles, file=fp)
else:
with open(os.path.join(results_dir, vid_proc_name)) as fp:
processed_files = fp.read().splitlines()
# Create annotation file the first time
annotation_exists = os.path.isfile(os.path.join(
results_dir, dataset_annotation_file))
if not annotation_exists:
try:
with open(os.path.join(
results_dir, dataset_annotation_file), 'w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=csv_columns)
writer.writeheader()
except IOError:
print("Error creating annotaton file. I/O error")
# Get json files list names in videos directory
files_list = []
for ann_file in os.listdir(os.path.join(videos_directory, vids_name)):
if ann_file.endswith(".json"):
files_list.append(ann_file[0:-5])
files_list = natsorted(files_list)
num_files = len(files_list)
print('found', num_files, 'files')
# traverse all the files
stop_videos = False
for file in files_list:
if stop_videos:
break
# check if current video is not in alredy processed
if file in processed_files:
print(file, 'has already been processed. Skipping it.')
continue
num_output_video = 0
# Search for the video files in videos_directory
video_name = file + '.mp4'
print('Processing video:', video_name)
if save_videos:
# create output directory
output_dir = os.path.join(results_dir, vids_name, file)
if not os.path.isdir(output_dir):
os.mkdir(output_dir)
# Load watson results
with open(os.path.join(
videos_directory, vids_name, file + '.json')) as f:
stt_results = json.load(f)
# Extract all the words with confidence >90
words_data = extract_words_from_watson_results(stt_results, max_words=5)
# Start the video capture
cap = cv2.VideoCapture(os.path.join(
videos_directory, vids_name, video_name))
# Extract video metadata
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
fps = cap.get(cv2.CAP_PROP_FPS)
print('video resolution:', width, ' x ', height)
print('video framerate:', fps)
frame_count = 0
fps_processing = 30.0 # fps holder
t = cv2.getTickCount() # initiate the tickCounter
count = 0
for entry in words_data:
# Extract speech to text data
print('entry:', type(entry), entry)
s_sec, s_millisec = divmod(float(entry['start']), 1)
e_sec, e_millisec = divmod(float(entry['end']), 1)
s_min = 0
e_min = 0
s_millisec = s_millisec * 1000
e_millisec = e_millisec * 1000
print('s_sec, s_millisec:', s_sec, s_millisec)
if s_sec >= 60:
s_min = math.floor(s_sec / 60.0)
s_sec = s_sec % 60
if e_sec >= 60:
e_min = math.floor(e_sec / 60.0)
e_sec = e_sec % 60
# Determine video frames involved in stt entry
min_frame = s_min*fps*60 + (s_sec*fps)
max_frame = e_min*fps*60 + (e_sec*fps)
# go to min_frame
cap.set(cv2.CAP_PROP_POS_FRAMES, min_frame)
frame_count = min_frame
# read frames from min_frame to max_frame
num_people = 0
valid_video = True
landmarks = []
angles = []
consecutive_frames_no_people = 0
while frame_count < max_frame:
if count == 0:
t = cv2.getTickCount()
# capture next frame
ret, frame = cap.read()
if not ret:
continue
frame_count += 1
# resize frame for faster processing
if frame.shape[0] <= 0 or frame.shape[1] <= 0:
continue
frame_small = cv2.resize(frame, (0, 0), fx=scale, fy=scale,
interpolation=cv2.INTER_LINEAR)
# detect faces and landmarjs
fa.update_features(frame_small)
landmarks.append(fa.get_mouth_features(scale=scale))
num_people = fa.get_num_people()
angles.append(fa.get_yaw())
# if it detects less than or more than 1 person
# go to next subtitle
if num_people != 1:
consecutive_frames_no_people += 1
if consecutive_frames_no_people >= max_bad_frames:
print(consecutive_frames_no_people,
' frames without 1 person. Skiping to next subtitle')
valid_video = False
break
# if only one person in the scene
if num_people == 1:
consecutive_frames_no_people = 0
fa.renderMouth(frame_small)
# Put fps at which we are processing camera feed on frame
cv2.putText(frame_small, "{0:.2f}-fps".format(fps_processing),
(50, height-50), cv2.FONT_HERSHEY_COMPLEX,
1, (0, 0, 255), 2)
# Display the image
cv2.imshow('Vid',frame_small)
# Read keyboard and exit if ESC was pressed
k = cv2.waitKey(1) & 0xFF
if k ==27:
exit()
elif k == ord('q'):
stop_videos = True
# increment frame counter
count = count + 1
# calculate fps at an interval of 100 frames
if (count == 30):
t = (cv2.getTickCount() - t)/cv2.getTickFrequency()
fps_processing = 30.0/t
count = 0
# if this was a valid video
if valid_video and len(landmarks) > 0:
num_output_video += 1
entry['mouth3d'] = landmarks
entry['angle'] = angles
if save_videos:
s_hr = 0
e_hr = 0
if s_min >= 60:
s_hr = math.floor(s_min / 60)
s_min = s_min % 60
if e_min >= 60:
e_hr = math.floor(e_min / 60)
e_min = e_min % 60
# cut and crop video
# ffmpeg -i input.mp4 -ss hh:mm:ss -filter:v crop=w:h:x:y -c:a copy -to hh:mm:ss output.mp4
ss = "{0:02d}:{1:02d}:{2:02d}.{3:03d}".format(
s_hr, s_min, int(s_sec), math.ceil(s_millisec))
es = "{0:02d}:{1:02d}:{2:02d}.{3:03d}".format(
e_hr, e_min, int(e_sec), math.ceil(e_millisec))
crop = "crop={0:1d}:{1:1d}:{2:1d}:{3:1d}".format(
bbw, bbh, bbx1, bby1)
out_name = os.path.join(output_dir, str(num_output_video))
subprocess.call(['ffmpeg', #'-hide_banner', '-loglevel', 'panic',
'-i', os.path.join(
videos_directory, vids_name, video_name),
'-ss', ss,
'-filter:v', crop, '-c:a', 'copy',
'-to', es, out_name +'.mp4'])
# save recognized speech
text_file = open(out_name +'.txt', "w")
text_file.write(entry['text'] + '\n')
text_file.write(str(entry['conf']))
text_file.close()
# append results to annotation file
append_annotation_file(os.path.join(
results_dir, dataset_annotation_file), words_data)
# save name of processed file
processed_files.append(file)
with open(os.path.join(results_dir, vid_proc_name), "w") as fp:
for p_file in processed_files:
print(p_file, file=fp)
# Release resources
cap.release()
cv2.destroyAllWindows()
def extract_text_conf_ts(s_idx, max_words, num_words, timestamps, conf, link):
text = ''
avg_conf = 0
start = timestamps[int(s_idx * max_words)][1]
end = timestamps[int(s_idx * max_words + num_words-1)][2]
for w_idx in range(num_words):
text = text + ' ' + timestamps[int(s_idx*max_words + w_idx)][0]
avg_conf += conf[int(s_idx*max_words + w_idx)][1]
avg_conf = round(avg_conf/num_words, 2)
if len(text.strip()) >= 4:
out_entry = {'link': link, 'text': text.strip(), 'conf': avg_conf,
'start':start, 'end': end, 'mouth3d': [],
'angle': [] }
else:
out_entry = {}
return out_entry
def extract_words_from_watson_results(stt_results, max_words=5):
data = stt_results['results']
link = stt_results['link']
link = link.rsplit('/', 1)[-1]
out_data = []
for sentence_idx, ann in enumerate(data):
data_ann = ann['alternatives'][0]
text = data_ann['transcript']
conf = data_ann['word_confidence']
timestamps = data_ann['timestamps']
num_words = len(timestamps)
num_splits = num_words//max_words
rest = num_words%max_words
if num_words < max_words:
maxx_words = num_words
else:
maxx_words = max_words
for s_idx in range(num_splits):
out_entry = extract_text_conf_ts(s_idx, maxx_words, maxx_words,
timestamps, conf, link)
out_data.append(out_entry)
if rest > 0:
out_entry = extract_text_conf_ts(num_splits, maxx_words, rest,
timestamps, conf, link)
if out_entry:
out_data.append(out_entry)
return out_data
def append_annotation_file(csv_file, data):
try:
with open(csv_file, 'a') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=csv_columns)
for entry in data:
writer.writerow(entry)
except IOError:
print("I/O error")
def remove_accents(input_str):
nfkd_form = unicodedata.normalize('NFKD', input_str)
only_ascii = nfkd_form.encode('ASCII', 'ignore')
return only_ascii
if __name__== "__main__":
# Parse input arguments
parser = argparse.ArgumentParser(description='Extract subvideos')
parser.add_argument('--dir', dest='videos_dir',
help='Directory with videos', type=str)
parser.add_argument('--cat', dest='category',
help='Video category', type=str)
parser.add_argument('--vids_log', dest='log_file',
help='Name of log file', type=str)
parser.add_argument('--results_dir', dest='results_dir',
help='Directory with results', type=str)
parser.add_argument('--ann_file', dest='ann_file',
help='Annotations file (csv)', type=str)
parser.add_argument('--save_videos', dest='save_videos',
help='Save videos', type=str, default='False')
args = parser.parse_args()
main(args)
I'm trying to apply a custom python function on every frame of a video, and create the video with modified frames as output. My input video is a mkv file, with variable framerate, and I'd like to get the same thing as output, so one frame in the input matches one in the output at the exact same time.
I tried to use this example of ffmpeg-python. However, it seems that the timestamp info are lost in the pipes. The output video has 689 frames when the input only has 300 (the durations also aren't a match, with 27s vs 11s for the input).
I also tried to first process each frame in my video and save the transformed version as PNGs. Then I "masked" the input video with the processed frames. This seems to be better because the output video has the same 11s duration than the input, but the frame count doesn't match (313 vs 300).
Code for the python-ffmpeg solution:
width = 1920
height = 1080
process1 = (
ffmpeg
.input(in_filename)
.output('pipe:', format='rawvideo', pix_fmt='rgb24')
.run_async(pipe_stdout=True)
)
process2 = (
ffmpeg
.input('pipe:', format='rawvideo', pix_fmt='rgb24', s='{}x{}'.format(width, height))
.output(out_filename, pix_fmt='yuv420p')
.overwrite_output()
.run_async(pipe_stdin=True)
)
while True:
in_bytes = process1.stdout.read(width * height * 3)
if not in_bytes:
break
in_frame = (
np
.frombuffer(in_bytes, np.uint8)
.reshape([height, width, 3])
)
# Just add 1 to the pixels for the example
out_frame = in_frame + 1
process2.stdin.write(
out_frame
.astype(np.uint8)
.tobytes()
)
process2.stdin.close()
process1.wait()
process2.wait()
Code for the overlay solution:
ffmpeg -i in.mkv -i test/%d.png -filter_complex "[0][1]overlay=0:0" -copyts out.mkv
Is there any other solution I didn't think about to perform what I'm trying to do? It doesn't seem to be that complicated but I can't find a way to do it.
Thanks for any help!
UPDATE:
Here are the logs for the input and output pipes of the python-ffmpeg solution.
Input
Input #0, matroska,webm, from 'in.mkv':
Metadata:
ENCODER : Lavf59.17.100
Duration: 00:00:11.48, start: 0.000000, bitrate: 45702 kb/s
Stream #0:0: Video: h264 (High 4:4:4 Predictive), yuvj420p(pc, gbr/unknown/unknown, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 60 fps, 60 tbr, 1k tbn (default)
Metadata:
ENCODER : Lavc58.134.100 h264_nvenc
DURATION : 00:00:11.483000000
Stream mapping:
Stream #0:0 -> #0:0 (h264 (native) -> rawvideo (native))
Output #0, rawvideo, to 'pipe:':
Metadata:
encoder : Lavf59.17.100
Stream #0:0: Video: rawvideo (RGB[24] / 0x18424752), rgb24(pc, gbr/unknown/unknown, progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 2985984 kb/s, 60 fps, 60 tbn (default)
Metadata:
DURATION : 00:00:11.483000000
encoder : Lavc59.20.100 rawvideo
frame= 689 fps=154 q=-0.0 Lsize= 4185675kB time=00:00:11.48 bitrate=2985984.1kbits/s dup=389 drop=0 speed=2.57x
Output
Input #0, rawvideo, from 'pipe:':
Duration: N/A, start: 0.000000, bitrate: 1244160 kb/s
Stream #0:0: Video: rawvideo (RGB[24] / 0x18424752), rgb24, 1920x1080, 1244160 kb/s, 25 tbr, 25 tbn
Stream mapping:
Stream #0:0 -> #0:0 (rawvideo (native) -> h264 (libx264))
[libx264 # 0000025afaf11140] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 AVX512
[libx264 # 0000025afaf11140] profile High, level 4.0, 4:2:0, 8-bit
[libx264 # 0000025afaf11140] 264 - core 164 r3081 19856cc - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=18 lookahead_threads=3 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, matroska, to 'images/videos/out.mkv':
Metadata:
encoder : Lavf59.17.100
Stream #0:0: Video: h264 (H264 / 0x34363248), yuv420p(tv, progressive), 1920x1080, q=2-31, 25 fps, 1k tbn
Metadata:
encoder : Lavc59.20.100 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
frame= 689 fps= 11 q=-0.0 Lsize= 4185675kB time=00:00:11.48 bitrate=2985984.1kbits/s dup=389 drop=0 speed=0.181x
video:4185675kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000000%
I'll answer my own question, as I've been able to solve the issue with the help of kesh in the comments.
There are basically two things:
vsync passthrough is required for the input video, to keep the number of frames
another external tool (MKVToolNix) has to be used twice to extract timestamps from the initial video and apply them to the output
Below is the relevant code to perform the whole operation using python and subprocess. You can use the following line on both input and output video to check that the timestamps are indeed the same for each frame: ffprobe -show_entries packet=pts_time,duration_time,stream_index video.mkv
width = 1920
height = 1080
process1 = (
ffmpeg
.input('in.mkv', vsync='passthrough')
.output('pipe:', format='rawvideo', pix_fmt='rgb24')
.run_async(pipe_stdout=True)
)
process2 = (
ffmpeg
.input('pipe:', format='rawvideo', pix_fmt='rgb24', s='{}x{}'.format(width, height))
.output('temp.mkv', pix_fmt='yuv420p')
.overwrite_output()
.run_async(pipe_stdin=True)
)
while True:
in_bytes = process1.stdout.read(width * height * 3)
if not in_bytes:
break
in_frame = (
np
.frombuffer(in_bytes, np.uint8)
.reshape([height, width, 3])
)
# Keep thing simple, just add 1 to each pixel
out_frame = in_frame + 1
process2.stdin.write(
out_np
.astype(np.uint8)
.tobytes()
)
process2.stdin.close()
process1.wait()
process2.wait()
# Extract timestamps from input video
subprocess.run(['mkvextract', 'in.mkv', 'timestamps_v2', '0:timestamps.txt'])
# Apply extracted timestamps to create synchronized output video
subprocess.run(['mkvmerge', '-o', 'out.mkv', '--timestamps', '0:timestamps.txt', 'temp.mkv'])
# Clean up
os.remove('temp.mkv')
os.remove('timestamps.txt')
Here is the original command line call that works at shell:
ffmpeg -i /Users/abc/Desktop/Test/Full_Mov.mov -vf "drawtext=fontfile=/System/Library/Fonts/Keyboard.ttf: text='SCENE BLAH BLAH - %{frame_num}': start_number=1: x=(w-tw)/2: y=h-(2*lh): fontcolor=white: fontsize=20: box=1: boxcolor=black: boxborderw=5","format=yuv420p" -f segment -segment_frames 123 -reset_timestamps 1 -c:a copy -map 0 "/Users/abc/Desktop/Test/%03d_test40.mov"
Getting a negative value when trying to run ffmpeg via subprocess, causing it to fail and not process. I've even tried dumbing down the call to just run ffmpeg by itself with no arguments, and it still returns a negative value.
import subprocess
ffmpeg = "/usr/local/bin/ffmpeg"
source = "/Users/abc/Desktop/Test/Full_Mov.mov"
destination = "/Users/abc/Desktop/Test/%03d_test40.mov"
cmd = "%s -i %s -vf \"drawtext=fontfile=/System/Library/Fonts/Keyboard.ttf: text='SCENE BLAH BLAH - %%{frame_num}': start_number=1: x=(w-tw)/2: y=h-(2*lh): fontcolor=black: fontsize=20: box=1: boxcolor=white: boxborderw=5\",\"format=yuv420p\" -f segment -segment_frames 123 -reset_timestamps 1 -c:a copy -map 0 \"%s\"" % (ffmpeg, source, destination)
log("ffmpeg cmd: %s" % cmd)
log(subprocess.check_output(cmd, shell=True))
When running full command:
CalledProcessError: Command '/usr/local/bin/ffmpeg -i /Users/abc/Desktop/Full_Mov.mov -vf "drawtext=fontfile=/System/Library/Fonts/Keyboard.ttf: text='SCENE BLAH BLAH - %{frame_num}': start_number=1: x=(w-tw)/2: y=h-(2*lh): fontcolor=black: fontsize=20: box=1: boxcolor=white: boxborderw=5","format=yuv420p" -f segment -segment_frames 318 -reset_timestamps 1 -c:a copy -map 0 "/Users/abc/Desktop/Test/%03d.tmp.mov"' returned non-zero exit status -8
raise CalledProcessError(retcode, cmd, output=output)
When running just ffmpeg no arguments:
CalledProcessError: Command '/usr/local/bin/ffmpeg' returned non-zero exit status 1
raise CalledProcessError(retcode, cmd, output=output)
Latest output with suggested changes below:
subprocess.CalledProcessError: Command '['/usr/local/bin/ffmpeg', '-i', '/Users/szaharak/Desktop/Flix_Test/ep888_sq66_main_mov_2019_05_20_14_15.mov', '-vf', 'drawtext=fontfile=/System/Library/Fonts/Keyboard.ttf:', 'text=SCENE BLAH BLAH - %%{frame_num}:', 'start_number=1:', 'x=(w-tw)/2:', 'y=h-(2*lh):', 'fontcolor=black:', 'fontsize=20:', 'box=1:', 'boxcolor=white:', 'boxborderw=5', 'format=yuv420p', '-f', 'segment', '-segment_frames', '123', '-reset_timestamps', '1', '-c:a', 'copy', '-map', '0', '/Users/szaharak/Desktop/Flix_Test/%03d_test40.mov']' returned non-zero exit status 1
[NULL # 0x7f88c3012200] Unable to find a suitable output format for 'text=SCENE BLAH BLAH - %%{frame_num}:'
text=SCENE BLAH BLAH - %%{frame_num}:: Invalid argument
And here is latest...
>>> rc = subprocess.check_call(cmd)
ffmpeg version N-93891-ge1839283bc-tessus https://evermeet.cx/ffmpeg/ Copyright (c) 2000-2019 the FFmpeg developers
built with Apple LLVM version 10.0.1 (clang-1001.0.46.4)
configuration: --cc=/usr/bin/clang --prefix=/opt/ffmpeg --extra-version=tessus --enable-avisynth --enable-fontconfig --enable-gpl --enable-libaom --enable-libass --enable-libbluray --enable-libdav1d --enable-libfreetype --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libmysofa --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopus --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-version3 --pkg-config-flags=--static --disable-ffplay
libavutil 56. 28.100 / 56. 28.100
libavcodec 58. 52.102 / 58. 52.102
libavformat 58. 27.103 / 58. 27.103
libavdevice 58. 7.100 / 58. 7.100
libavfilter 7. 53.101 / 7. 53.101
libswscale 5. 4.101 / 5. 4.101
libswresample 3. 4.100 / 3. 4.100
libpostproc 55. 4.100 / 55. 4.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/Users/szaharak/Desktop/Flix_Test/ep888_sq66_main_mov_2019_05_20_14_15.mov':
Metadata:
major_brand : qt
minor_version : 537199360
compatible_brands: qt
creation_time : 2019-05-14T17:58:24.000000Z
com.apple.quicktime.player.movie.audio.gain: 1.000000
com.apple.quicktime.player.movie.audio.treble: 0.000000
com.apple.quicktime.player.movie.audio.bass: 0.000000
com.apple.quicktime.player.movie.audio.balance: 0.000000
com.apple.quicktime.player.movie.audio.pitchshift: 0.000000
com.apple.quicktime.player.movie.audio.mute:
com.apple.quicktime.player.movie.visual.brightness: 0.000000
com.apple.quicktime.player.movie.visual.color: 1.000000
com.apple.quicktime.player.movie.visual.tint: 0.000000
com.apple.quicktime.player.movie.visual.contrast: 1.000000
com.apple.quicktime.player.version: 7.6.6 (7.6.6)
com.apple.quicktime.version: 7.7.3 (2943.14) 0x7738000 (Mac OS X, 10.11.6, 15G22010)
Duration: 00:01:12.67, start: 0.000000, bitrate: 23379 kb/s
Stream #0:0(eng): Video: mjpeg (Baseline) (jpeg / 0x6765706A), yuvj422p(pc, bt470bg/unknown/unknown), 1280x720 [SAR 72:72 DAR 16:9], 21838 kb/s, 24 fps, 24 tbr, 24k tbn, 24k tbc (default)
Metadata:
creation_time : 2019-05-14T17:58:24.000000Z
handler_name : Apple Video Media Handler
encoder : Photo - JPEG
Stream #0:1(eng): Audio: pcm_s16be (twos / 0x736F7774), 48000 Hz, stereo, s16, 1536 kb/s (default)
Metadata:
creation_time : 2019-05-14T17:58:24.000000Z
handler_name : Apple Sound Media Handler
[NULL # 0x7f8ddc8ce200] Unable to find a suitable output format for 'text=SCENE BLAH BLAH - %{frame_num}: start_number=1: x=(w-tw)/2: y=h-(2*lh): fontcolor=black: fontsize=20: box=1: boxcolor=white: boxborderw=5: format=yuv420p'
text=SCENE BLAH BLAH - %{frame_num}: start_number=1: x=(w-tw)/2: y=h-(2*lh): fontcolor=black: fontsize=20: box=1: boxcolor=white: boxborderw=5: format=yuv420p: Invalid argument
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python#2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 190, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/local/bin/ffmpeg', '-i', '/Users/szaharak/Desktop/Flix_Test/ep888_sq66_main_mov_2019_05_20_14_15.mov', '-vf', 'drawtext=fontfile=/System/Library/Fonts/Keyboard.ttf:', 'text=SCENE BLAH BLAH - %{frame_num}: start_number=1: x=(w-tw)/2: y=h-(2*lh): fontcolor=black: fontsize=20: box=1: boxcolor=white: boxborderw=5: format=yuv420p', '-f', 'segment', '-segment_frames', '123', '-reset_timestamps', '1', '-c:a', 'copy', '-map', '0', '/Users/szaharak/Desktop/Flix_Test/%03d_test40.mov']' returned non-zero exit status 1
ffmpeg command line is tricky as there are a lot of special arguments and mistakes are sometimes difficult to understand, the program sometimes interprets/discards wrong arguments, resulting in cryptic error messages.
Another difficulty is using check_output. You don't really need to do this, and if the program fails you won't have any output at all.
I would try the following (that I could not test), which is to:
drop shell=True
pass a list of arguments instead of composing the string yourself. It allows to forget about the quoting/escaping. You don't need to format the command, source & destination since they're standalone arguments:
Don't log, just print with check_call.
As a bonus, you'll get the output in real-time, instead of in the end...
note that I have removed the single quotes in "text=SCENE BLAH BLAH - %{frame_num}:, too and unescaped the % char.
like this:
cmd = [ffmpeg,"-i",source,"-vf",
"drawtext=fontfile=/System/Library/Fonts/Keyboard.ttf:",
"text=SCENE BLAH BLAH - %{frame_num}: start_number=1: x=(w-tw)/2: y=h-(2*lh): fontcolor=black: fontsize=20: box=1: boxcolor=white: boxborderw=5: format=yuv420p",
"-f","segment","-segment_frames","123",
"-reset_timestamps","1","-c:a","copy","-map","0",destination]
log("ffmpeg cmd: %s" % cmd)
rc = subprocess.check_call(cmd)
if rc:
raise Exception("ffmpeg failed")
for the case of ffmpeg, you could migrate to ffmpeg python module. Could save some argument parsing trouble too.
I am using Stanford dependency parser and the I get the following output of the sentence
I shot an elephant in my sleep
>>>python dep_parsing.py
[((u'shot', u'VBD'), u'nsubj', (u'I', u'PRP')), ((u'shot', u'VBD'), u'dobj', (u'elephant', u'NN')), ((u'elephant', u'NN'), u'det', (u'an', u'DT')), ((u'shot', u'VBD'), u'nmod', (u'sleep', u'NN')), ((u'sleep', u'NN'), u'case', (u'in', u'IN')), ((u'sleep', u'NN'), u'nmod:poss', (u'my', u'PRP$'))]
However, I want the numbered tokens as output just as it is here
nsubj(shot-2, I-1)
root(ROOT-0, shot-2)
det(elephant-4, an-3)
dobj(shot-2, elephant-4)
case(sleep-7, in-5)
nmod:poss(sleep-7, my-6)
nmod(shot-2, sleep-7)
Here is my code till now.
from nltk.parse.stanford import StanfordDependencyParser
stanford_parser_dir = 'stanford-parser/'
eng_model_path = stanford_parser_dir + "stanford-parser-models/edu/stanford/nlp/models/lexparser/englishRNN.ser.gz"
my_path_to_models_jar = stanford_parser_dir + "stanford-parser-3.5.2-models.jar"
my_path_to_jar = stanford_parser_dir + "stanford-parser.jar"
dependency_parser = StanfordDependencyParser(path_to_jar=my_path_to_jar, path_to_models_jar=my_path_to_models_jar)
result = dependency_parser.raw_parse('I shot an elephant in my sleep')
dep = result.next()
a = list(dep.triples())
print a
How can I have such an output?
Write a recursive function that traverses your tree. As a first pass, just try assigning the numbers to the words.
I want to run the following line in python:
ffmpeg -i test.avi -ss 0 -r 25 -vframes 100 ./out/image-%3d.jpg 2>&1 | grep output
which should, if I directly run it in shell, output:
>>Output #0, image2, to './out/image-%3d.jpg':
However, when I do this in python:
command = 'ffmpeg -i '+video_name+' -ss '+str(T) + ' -r '+str(25) + ' -vframes '+str(N)+' '+out_dir+'/image-%3d.jpg 2>&1 | grep output'
argx = shlex.split(command)
print argx
proc = subprocess.Popen(argx,stdout=subprocess.PIPE,shell = True)
(out,err) = proc.communicate()
it outputs this:
['ffmpeg', '-i', 'test.avi', '-ss', '0', '-r', '25', '-vframes', '100', './out/image-%3d.jpg', '2>&1', '|', 'grep', 'output']
ffmpeg version 1.2.6-7:1.2.6-1~trusty1 Copyright (c) 2000-2014 the FFmpeg developers
built on Apr 26 2014 18:52:58 with gcc 4.8 (Ubuntu 4.8.2-19ubuntu1)
configuration: --arch=amd64 --disable-stripping --enable-avresample --enable-pthreads --enable-runtime-cpudetect --extra-version='7:1.2.6-1~trusty1' --libdir=/usr/lib/x86_64-linux-gnu --prefix=/usr --enable-bzlib --enable-libdc1394 --enable-libfreetype --enable-frei0r --enable-gnutls --enable-libgsm --enable-libmp3lame --enable-librtmp --enable-libopencv --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-libschroedinger --enable-libspeex --enable-libtheora --enable-vaapi --enable-vdpau --enable-libvorbis --enable-libvpx --enable-zlib --enable-gpl --enable-postproc --enable-libcdio --enable-x11grab --enable-libx264 --shlibdir=/usr/lib/x86_64-linux-gnu --enable-shared --disable-static
libavutil 52. 18.100 / 52. 18.100
libavcodec 54. 92.100 / 54. 92.100
libavformat 54. 63.104 / 54. 63.104
libavdevice 53. 5.103 / 53. 5.103
libavfilter 3. 42.103 / 3. 42.103
libswscale 2. 2.100 / 2. 2.100
libswresample 0. 17.102 / 0. 17.102
libpostproc 52. 2.100 / 52. 2.100
Hyper fast Audio and Video encoder
usage: ffmpeg [options] [[infile options] -i infile]... {[outfile options] outfile}...
Use -h to get full help or, even better, run 'man ffmpeg'
where apparently ffmpeg didn't get the proper arguments
where is wrong? Thx
When shell=True, you should pass the command as a string, not as a list of arguments:
command = 'ffmpeg -i '+video_name+' -ss '+str(T) + ' -r '+str(25) + ' -vframes '+str(N)+' '+out_dir+'/image-%3d.jpg 2>&1 | grep output'
proc = subprocess.Popen(command, stdout=subprocess.PIPE, shell=True)
out, err = proc.communicate()
Note that using shell=True is a security risk if command depends on user input.
If you wish to use shell=False, then you'll need to replace the shell pipeline with two subprocess.Popen calls, with proc1.stdout connected to proc2.stdin:
import subprocess
PIPE = subprocess.PIPE
filename = out_dir+'/image-%3d.jpg'
args = ['ffmpeg', '-i', video_name, '-ss', T, '-r', 25, '-vframes', N, filename]
proc1 = subprocess.Popen(args, stdout=PIPE, stderr=PIPE, shell=False)
proc2 = subprocess.Popen(['grep', 'output'), stdin=proc1.stdout, stdout=PIPE, stderr=PIPE)
proc1.stdout.close() # Allow proc1 to receive a SIGPIPE if proc2 exits.
out, err = proc2.communicate()