I want to make an editor that does the following:
1) takes an mp3 audio file
2) Takes a picture --a jpg file
3) Outputs a simple video format e.g. .mov which consists of the jpg file with the mp3 file in the background
4) Does NOTHING else
I want to use this as a project to learn just the basics of all this stuff however I do not want to code basic things by hand. Where do I start and what key steps do I take to accomplish this?
I am decent with PHP and Java and do not mind learning Python for this. I actually would ideally want to write this in Python to gain experience.
Thanks!
If you want to code such a solution yourself - forget Python, compile ffmpeg and use it's classes directly from your code after you carefully read them (or maybe use pyffmpeg, which still requires you to know ffmpeg internals).
However, I'm pretty sure that what you want could be done with ffmpeg executable alone from command line - but that way your Python code would end as a wrapper around os.Popen (it's quite popular solution actually).
I think it's a matter of what level of understanding you're aiming at: either you're ok with reading ffmpeg docs and believing it's going to work (then: use Python), or you need to dive deep into ffmpeg sources to gain real understanding what's going on (which I don't have, btw) - and then using pythonic bindings will just stand in your way.
I have needed ffmpeg (from django) a few times already and never had to do anything more than just assemble a list with ffmpeg command line args. On the other hand I would very much like to actually understand what the hell I'm doing, but no one seemed interested in paying me for groking ffmpeg sources. :-(
I'm pretty sure you could do this all from the mencoder commandline (use -speed option I think; might need to give it a duplicate of your jpg for every few seconds of video you want as it can only slow things down by a factor of 100 at the most).
If you opt for the ffmpeg CLI solution, or need a process to try and replicate with the libraries directly, the relevant CLI command would be the straightforward:
ffmpeg -i input.jpg -i input.mp3 output.mov
Related
I have a bunch of videos for which I want to extract specific sections (either as videos or as frames). I get the specific sections from a .json file where the start and end frames are stored according to labels, like 'cat in video', 'dog in video'. I have an existing method in Python using opencv using the method mentioned here but I found a one-liner using ffmpeg which is a lot more faster and efficient than my Python script, except that I have to manually fill in the start and end frames in this command.
ffmpeg -i in.mp4 -vf select='between(n\,x\,y)' -vsync 0 frames%d.png
I read a few questions about working with .json files in a shell script or passing arguments to a batch script which looks quite complicated and might spoil my system. Since I'm not familar working with .json files in a shell/batch script, I'm not sure how to start. Can anyone point me in the right direction on how to make a batch script that can read variables from a .json file and input it into my ffmpeg command?
Since you're already familiar with Python, I suggest you to use it to parse JSON files, then you can use ffmpeg-python library, which is a ffmpeg binding for Python. It also has a crop function, which I assume is what you need.
An alternative would be to use the os.system('ffmpeg <arguments>') calls from a Python script, which allows you to run external tools from the script.
Python natively supports JSON with its builtin json package
As for doing this in python, here is an alternative approach that you can try my ffmpegio-core package:
import ffmpegio
ffmpegio.transcode('in.mp4','frames%d.png',vf=f"select='between(n\,{x}\,{y})'",vsync=0)
If the videos are constant frame rate, it could be faster to specify the start and end timestamps as input options:
fs = ffmpegio.probe.video_streams_basic('in.mp4')[0]['frame_rate']
ffmpegio.transcode('in.mp4', 'frames%d.png', ss_in=x/fs, to_in=y/fs, vsync=0)
If you don't know the frame rate, you are calling ffprobe and ffmpeg for each file, so there is a tradeoff. But if your input video is long, it could be worthwhile.
But if speed is your primary goal, calling FFmpeg directly always is the fastest.
ffmpegio GitHub repo
The final goal would be to capture the regular webcam feed, manipulate it in some way (blur face, replace background, ...) and then output the result in some way so that the manipulated feed can be chosen as input for whatever application expects a webcam (Discord, Teams, ...).
I am working on a Windows machine and would prefer to do this in Python. This combination has me lost, at the moment.
capturing and manipulating is easy with https://pypi.org/project/opencv-python/
the exposing the feed step seems overly complicated
Apparently, on Linux there are Python libraries just offering that functionality, but they do not work on Windows. Everything that sounded like it could hint towards a good solution went directly into C++ country. There are programs which basically do what I want, e.g. webcamoid (https://webcamoid.github.io/) and I could hack together a solution which captures and processes the feed via Python, then uses webcamoid to record the output and feed it into a virtual webcam. But I'd much prefer to do the whole thing in one.
I have been searching around a bit and found these questions on stackoverflow on the topic:
Using OpenCV Output as Webcam (uses C++ but also gives a Python solution - however, pyfakewebcam does not work on Windows)
How do I stream to a new video source? (not really answered, just links to other question)
How to simulate a webcam device (more C++ hints, links to msdn's Writing a Custom Media Source)
Artificial webcam on windows (basically what I want, but in C++ again)
Writing a virtual webcam? (more explanation on how this might work in C++)
I am getting the strong impression that I need C++ for this or have to work on Linux. However, lacking both a Linux machine and any setup as well as experience in programming in C++, this seems like a large amount of work for the "toy project" this was supposed to be. But maybe I am just missing an obvious library or functionality somewhere?
Hence, the question is: Is there a way to expose a "webcam" stream via Python on Windows?
And, one last idea: What if I used a docker container with a Linux Python environment to implement the functionality I want. Could that container then stream a "virtual webcam" to the host?
You can do this by using pyvirtualcam
First, you need to install it using pip
pip install pyvirtualcam
Then go to This Link and download the zip file from the latest release
Unzip and navigate to \bin\[your computer's bittedness]
Open Command Prompt in that directory and type
regsvr32 /n /i:1 "obs-virtualsource.dll"
This will register a fake camera to your computer
and if you want to unregister the camera then run this command:
regsvr32 /u "obs-virtualsource.dll"
Now you can send frames to the camera using pyvirtualcam
This is a sample:
import pyvirtualcam
import numpy as np
with pyvirtualcam.Camera(width=1280, height=720, fps=30) as cam:
while True:
frame = np.zeros((cam.height, cam.width, 4), np.uint8) # RGBA
frame[:,:,:3] = cam.frames_sent % 255 # grayscale animation
frame[:,:,3] = 255
cam.send(frame)
cam.sleep_until_next_frame()
I am writing a little python script that converts mp3 files to aac. I am doing so by using avconv and I am wondering if I am doing it right.
Currently my command looks like this:
avconv -i input.mp3 -ab 80k output.aac
This brings me to my first question: I am using -ab 80k as this works with my test-files. On some files I can go higher and use 100k. But I'd prefer to have that always on the highest settings. Is there a way to say that?
The other question: I am using it in a python script. Currently I call it as a subprocess. What I'd prefer is not to do so, as this forces me to write a file to disc and then load it again when everything is done. Is there a way to only do it in memory? I am returning the file afterwards using web.py and don't need or want it on my disc? So would be cool not having to use temporary files at all.
Thanks for any tipps and tricks :)
I don't have the -ab option but if it is equivalent to -ar (specify the sample rate), I should point out that your ears won't be able to tell the difference between 80k and anything higher.
On the subject of temporary files, have you considered using /tmp or a specific tmpfs file system created for the purpose.
Edit:
In response to comment about tempfiles, yes you still use them but create them in /tmp or a tmpfs file system that you have created for the job. It should get cleared on reboot but I would expect you to delete the file once you have passed it on anyway.
The other point about lossless aac I may come back to you later.
Edit 2:
As I suspected the aac format is described as the logical successor to mp3 and I strongly suspect, you or someone else may know different, that as mp3 is what is termed as lossy, compressed i.e. bits ( no pun intended) missing, your desire to convert losslessly is doomed, in so much as the source is already lossy.
Of course, being no expert in the matter, I stand to be corrected.
Edit 3:
your comment about too many frames leads me to believe that you are conflating the two avconv options -ar and -b
The -b is used for video and specifies the output bit rate for video and audio. The way you are using it, I suspect that it is attempting to apply the same bit rate to audio and video but there is a limit on the audio stream.
You would have to use -b:v to tell avconv to set the video bit rate and leave the audio rate alone.
I suggest that you lose the -ab option and use -ar instead as that is audio only.
I'm working on a side project where we want to process images in a hadoop mapreduce program (for eventual deployment to Amazon's elastic mapreduce). The input to the process will be a list of all the files, each with a little extra data attached (the lat/long position of the bottom left corner - these are aerial photos)
The actual processing needs to take place in Python code so we can leverage the Python Image Library. All the Python streaming examples I can find use stdin and process text input. Can I send image data to Python through stdin? If so, how?
I wrote a Mapper class in Java that takes the list of files and saves the names, the extra data, and the binary contents to a sequence file. I was thinking maybe I need to write a custom Java mapper that takes in the sequence file and pipes it to Python. Is that the right approach? If so, what should the Java to pipe the images out and the Python to read them in look like?
In case it's not obvious, I'm not terribly familiar with Java OR Python, so it's also possible I'm just biting off way more than I can chew with this as my introduction to both languages...
There are a few possible approaches that I can see:
Use both the extra data and the file contents as input to your python program. The tricky part here will be the encoding. I frankly have no idea how streaming works with raw binary content, and I'm assuming that basic answer is "not well." The main issue is that the stdin/stdout communication between processes is very text-based, relying on delimiting input with tabs and newlines, and things like that. You would need to worry about the encoding of the image data, and probably have some sort of pre-processing step, or a custom InputFormat so that you could represent the image as text.
Use only the extra data and the file location as input to your python program. Then the program can independently read the actual image data from the file. The hiccup here is making sure that the file is available to the python script. Remember this is a distributed environment, so the files would have to be in HDFS or somewhere similar, and I don't know if there are good libraries for reading files from HDFS in python.
Do the java-python interaction yourself. Write a java mapper that uses the Runtime class to start the python process itself. This way you get full control over exactly how the two worlds communicate, but obviously its more code and a bit more involved.
I am looking for a high level audio library that supports crossfading for python (and that works in linux). In fact crossfading a song and saving it is about the only thing I need.
I tried pyechonest but I find it really slow. Working with multiple songs at the same time is hard on memory too (I tried to crossfade about 10 songs in one, but I got out of memory errors and my script was using 1.4Gb of memory). So now I'm looking for something else that works with python.
I have no idea if there exists anything like that, if not, are there good command line tools for this, I could write a wrapper for the tool.
A list of Python sound libraries.
Play a Sound with Python
PyGame or Snack would work, but for this, I'd use something like audioop.
— basic first steps here : merge background audio file
A scriptable solution using external tools AviSynth and avs2wav or WAVI:
Create an AviSynth script file:
test.avs
v=ColorBars()
a1=WAVSource("audio1.wav").FadeOut(50)
a2=WAVSource("audio2.wav").Reverse.FadeOut(50).Reverse
AudioDub(v,a1+a2)
Script fades out on audio1 stores that in a1 then fades in on audio2 and stores that in a2.
a1 & a2 are concatenated and then dubbed with a Colorbar screen pattern to make a video.
You can't just work with audio alone - a valid video must be generated.
I kept the script as simple as possible for demonstration purposes. Google for more details on audio processing via AviSynth.
Now using avs2wav (or WAVI) you can render the audio:
avs2wav.exe test.avs combined.wav
or
wavi.exe test.avs combined.wav
Good luck!
Some references:
How to edit with Avisynth
AviSynth filters reference