I'm using ffmpeg to work on some videos. I'm resizing each video down to a smaller size, padding it, and overlaying a static image overlaying the padding. Right now the static image has one textbox that says some information about the video. That was simple enough. I want to also add more information to the static image, though. I want to add information about the order of videos, with the current video being a different color. For example, I want text that looks like the following:
Video1→Video2→Video→VIDEO4→Video5→Video6
Where VIDEO4 is a different color, so we can easily see where we are in the order. It doesn't seem to be that easily done with ffmpeg:
I don't believe I could do multiple colors using one "drawtext" filter
If I have multiple textboxes (one for previous videos in the list, one for the current video in the list, and one for the videos-to-come), they don't line up very well horizontally, as the actual names of the video have varying lengths of text. It's a bit of a nightmare.
I can't use ASS scripts/subtitles because it's being put on static image, not a video
Is there any other solution to this other than just attempting to guess at the X value of these drawtext filters? Could I actually use some sort of subtitle script on an image? Am I able to reference other textboxes? If so, I could at least calculate the width of the textbox and position the next one accordingly. Everything I've seen so far has had some sort of timestamp beginning and end, and I just want it there for the whole video.
I'm using python and the ffmpeg-python library to interface with ffmpeg. This allows me to use a configuration file so I can dynamically add/remove videos to be created.
Just for more information, here's a snippit of how I'm making the videos:
overlay_input = ffmpeg.input(overlay_image)\
.drawtext(text="blahblahblah",
... text options...)
video_input = ffmpeg.input(video, re=None)\
.video\
.filter("scale",
...scaling options...)\
.filter("pad",
...padding options...)\
.overlay(overlay_input)
Any information would be very appreciated!
Related
I want to make a program that will perform different functions depending on which part of an image I click on. I'd be ok with cutting up the image into different parts and placing them together, but they are in the shape of a circle, so I'm not sure how well I'd even be able to cut it since i'm just using Paint, nor how accurately I could place those images back together.
I'll include the image I'm using below.
Is there a way to make a single image have multiple buttons associated with it using python? I've mostly been looking at tkinter and could not figure out a way to do it.
If not, is there a good method I can use to put the image back together after cutting it up and making each image a button?
I think I could do this using Reactjs, but I'd rather do it all in python in possible because of other parts of the program.
I can make one image a button, or a bunch of different images into a bunch of buttons, but I haven't yet figured out how to make one image a lot of buttons, or place all of those buttons properly togther without their blank spaces overlapping, so I'd rather it all be one whole image if possible.
I am trying to put together a script to fix PDFs a large number of PDFs that have been exported from Autocad via their DWG2PDF print driver.
When using this driver all SHX fonts are rendered as shape data instead of text data, they do however have a comment inserted into the PDF at the expected location with the expected text.
So far in my script I have got it to run through the PDF and insert hidden text on top of each section, with the text squashed to the size of the comment, this gets me 90% of the way and gives me a document that is searchable.
Unfortunately the sizing of the comment regions is relatively course (integer based) which makes it difficult to accurately determine the orientation of short text, and results in uneven sized boxes around text.
What I would like to be able to do is parse through the shape data in the PDF, collect anything within the bounds of the comment, and then determine a smaller and more accurate bounding box. However all the information I can find is by people trying to parse through text data, and I haven't been able to find anything at all in terms of shape data.
The below image is an example of the raw text in the PDF, the second image shows the comment bounding box in blue, with the red text being what I am setting to hidden to make the document searchable, and copy/paste able. I can get things a little better by shrinking the box by a fixed margin, but with small text items the low resolution of the comment box coordinate data messes things up.
To get this far I am using a combination of PyPDF2 and reportlab, but am open to moving to different libraries.
I didn't end up finding a solution with PyPDF2, I was able to find an easy way to iterate over shape data in pdfminer.six, but then couldn't find a nice way in pdfminer to extract annotation data.
As such I am using one library to get the annotations, one to look at the shape data, and last of all a third library to add the hidden text on the new pdf. It runs pretty slowly as sheet complexity increases but is giving me good enough results, see image below where the rough green borders as found in the annotations are shrunk to the blue borders surrounding the text. Of course I don't draw the boundaries, and use invisible text for the actual program output, giving pretty good selectable/searchable text.
If anyone is interested in looping over the shape data in PDFs the below snippet should get you started.
from pdfminer.high_level import extract_pages
from pdfminer.layout import LTLine, LTCurve
for page_layout in extract_pages("TestSchem.pdf"):
for element in page_layout:
if isinstance(element, LTCurve) or isinstance(element, LTLine):
print(element.bbox)
What I need to do is combine two or more GIF files into one horizontally using Python. I thought about using PIL or ffmpeg but I could never really get anywhere. I found partial solutions on the internet, however they aren't exactly what I'm looking for. I'm looking for a way to add an uncertain amount of gifs to one horizontal strip gif. The sizes of each individual GIF are different, and the amount of GIFS being stacked horizontally changes depending on the user's input. Essentially a dynamic hstack filter for ffmpeg. Help would be appreciated!
So I have a video with multiple scenes. each scene has a text in the bottom left/right of it (e.g. scene A with text "ABC", scene B with text "XYZ"...etc)
I want to detect when the scenes start and when it ends (in timecode if possible) to use ffmpeg to split the videos to multiple files.
I don't really care about extracting the text, just using it to know a start/end of a scene.
I tried to use PySceneDetect with the "detect-content" algorithm, the results are good but still a lot of manual work that I have to do (maybe due to the nature of my videos)
To be specific, my question is, how can I use OpenCV (or any other tool you suggest) to read/report text change in the corner of video? or if you aware of a ready tool that will do the job?
I want to extract the text information contained in a postscript image file (the captions to my axis labels).
These images were generated with pgplot. I have tried ps2ascii and ps2txt on Ubuntu but they didn't produce any useful results. Does anyone know of another method?
Thanks
It's likely that pgplot drew the fonts in the text directly with lines rather than using text. Especially since pgplot is designed to output to a huge range of devices including plotters where you would have to do this.
Edit:
If you have enough plots to be worth
the effort than it's a very simple
image processing task. Convert each
page to something like tiff, in mono
chrome Threshold the image to binary,
the text will be max pixel value.
Use a template matching technique.
If you have a limited set of
possible labels then just match the
entire label, you can even start
with a template of the correct size
and rotation. Then just flag each
plot as containing label[1-n], no
need to read the actual text.
If you
don't know the label then you can
still do OCR fairly easily, just
extract the region around the axis,
rotate it for the vertical - and use
Google's free OCR lib
If you have pgplot you can even
build the training set for OCR or
the template images directly rather
than having to harvest them from the
image list