I want to write a python script which takes the input as the image of the spectrogram and generates the audio from it. Is there a way to convert the image of spectrogram into corresponding audio ?
I believe that there must be a way to reverse engineer the image of spectrogram to generate the audio. Can someone please help me with the same?
By a strange coincidence, I also needed to do this to recover audio for which only the spectrograms were available, but could find no tools to do this so wrote it myself in C. It's not simple and the results are, as user14325 rightly points out, very noisy compared to the originals, partly due to the low time resolution of most spectrograms but mostly because the phase information for each data point is lost and has to be invented.
However, if you are interested, you will find a brief description at
https://wikidelia.net/wiki/Spectrograms#Inverse_spectrograms
and you can find the code by following the "even hairier custom software" link and checking the files named "run.*" (the rest of the code there is for log-freq-axis forward spectrograms)
Related
I want to find the number of times a snippet of audio is repeated in another audio.
There are libraries like https://github.com/worldveil/dejavu which can be used to create fingerprints of audio after that it can be used for recognition but it only tells whether the snippet exists in audio or not, it does not give count.
Is there any way to make changes to find the number of times the recorded audio repeats in the source(any audio from database)?
Thanks
If it's an exact copy, you may want to convolve said audio with the source audio you're trying to extract from. Then the peaks from the convolution will show you where the offset is.
for my school project, I need to find images in a large dataset. I'm working with python and opencv. Until now, I've managed to find an exact match of an image in the dataset but it takes a lot of time even though I had 20 images for the test code. So, I've searched few pages of google and I've tried the code on these pages
image hashing
building an image hashing search engine
feature matching
Also, I've been thinking to search through the hashed dataset, save their paths, then find the best feature matching image among them. But most of the time, my narrowed down working area is so much different than what is my query image.
The image hashing is really great. It looks like what I need but there is a problem: I need to find an exact match, not similar photos. So, I'm asking you guys, if you have any suggestion or a piece of code might help or improve the reference code that I've linked, can you share it with me? I'd be really happy to try or research what you guys send or suggest.
opencv is probably the wrong tool for this. The algorithms there are geared towards finding similar matches, not exact ones. The general idea is to use machine learning to teach the code to recognize what a car looks like so it can detect cars in videos, even when the color or form changes (driving in the shadow, different make, etc).
I've found two approaches work well when trying to build an image database.
Use a normal hash algorithm like SHA-256 plus maybe some metadata (file or image size) to find matches
Resize the image down to 4x4 or even 2x2. Use the pixel RGB values as "hash".
The first approach is to reduce the image to a number. You can then put the number in a look up table. When searching for the image, apply the same hashing algorithm to the image you're looking for. Use the new number to look in the table. If it's there, you have a match.
Note: In all cases, hashing can produce the same number for different pictures. So you have to compare all the pixels of two pictures to make sure it's really an exact match. That's why it sometimes helps to add information like the picture size (in pixels, not file size in bytes).
The second approach allows to find pictures which very similar to the eye but in fact slightly different. Imagine cropping off a single pixel column on the left or tilting the image by 0.01°. To you, the image will be the same but for a computer, they will by totally different. The second approach tries to average small changes out. The cost here is that you will get more collisions, especially for B&W pictures.
Finding exact image matches using hash functions can be done with the undouble library (Disclaimer: I am also the author). It works using a multi-step process of pre-processing the images (grayscaling, normalizing, and scaling), computing the image hash, and the grouping of images based on a threshold value.
I have a decent amount of experience with OpenCV and am currently familiarizing myself with stereo vision. I happen to have two JeVois cameras (don't ask why) and was wondering if it was possible to run some sort of code on each camera to distribute the workload and cut down on processing time. It needs to be so that each camera can do part of the overall process (without needing to talk to each other) and the computer they're connected to receives that information and handles the rest of the work. If this is possible, does anyone have any solutions or tips? Thanks in advance!
To generalize the stereo-vision pipeline (look here for more in-depth):
Find the intrinsic/extrinsic values of each camera (good illustration here)
Solve for the transformation that will rectify your cameras' images (good illustration here)
Capture a pair of images
Transform the images according to Step 2.
Perform stereo-correspondence on that pair of rectified images
If we can assume that your cameras are going to remain perfectly stationary (relative to each other), you'll only need to perform Steps 1 and 2 one time after camera installation.
That leaves you with image capture (duh) and the image rectification as general stereo-vision tasks that can be done without the two cameras communicating.
Additionally, there are some pre-processing techniques (you could try this and this) that have been shown to improve the accuracy of some stereo-correspondence algorithms. These could also be done on each of your image-capture platforms individually.
I was implementing the depth map construction, code of which (in Python) is available here OpenCv Docs - depthMap I was successful in getting the depth map as they showed in the doc for their given images-pair (left and right stereo images) tsukuba_l.png and tsukuba_2.png. I considered testing my own image pairs, so I took from my mobile a pair of images, as shown below:
When I run the code, I'm getting the depth map something like this
I tried playing with numDisparities and blocksize, but it didn't help in getting the best map.
I thought of checking the script of cv2.StereoBM_create in its master folder in Github, but couldn't get that online. Can you help me with a way to implement depth maps for custom images taken by me? is there a way that we can play with the parameters or at least get me the link to GitHub master module that has all Stereo related modules. Thank you.
I guess you did not rectify the images which is fundamental for stereo matching.
You should first calibrate your stereo system (if you took them with mobile phone every image pair you take will have a different transform, the two cameras need to have always the same transformation between each other) and then rectify the images, in that way they are projected onto the same plane, then the stereo match algorithm looks for correspondences in the other image on the same rows.
Check in the docs for stereoRectify(), you will see some images as example of the rectification process.
By the way there is another python example based on SemiGlboal Block Matching algorithm in opencv/samples/python/stereo_match.py.
I am trying to compute a rough "quality" metric for a video, which takes the following into consideration:
"Smoothness" of video; i.e., the opposite of how "choppy" it is
Image quality; i.e. if there are a lot of compression artifacts, the quality should decrease in size
I came across https://github.com/aizvorski/scikit-video, but the code seems to be littered with FIXMEs and TODOs, and on top of that there's barely any comments or documentation.
Is there a Python library, or even a program with a CLI, for computing video quality, or perhaps a set of libraries that will help me compute the above two metrics separately?
Image Quality
I would think that "Image Quality" is largely a function of bit-depth (or effective bit-depth) and bit-rate.
You can parse ffmpeg output to get this information. PIL or PyQt/PySide can also do this.
Smoothness
For smoothness, you may need to use some type of optical flow algorithm and get deltas from frame to frame.
OpenCV looks like a project that does many of these things.