FFT differs between Python and C# implementations

FFT differs between Python and C# implementations - python

I'm currently trying to calculate the cross correlation of two audio files using this post as a reference.
In python, the algorithm works as expected:
source, source_sample = librosa.load(SOURCE_PATH, sr = None)
target, _ = librosa.load(TARGET_PATH, sr = source_sample)
# this next power of two that is greater than len(source) + len(target)
DESIRED_LEN = 262144
source = np.pad(source, (0, DESIRED_LEN - len(source)), 'constant')
target = np.pad(target, (0, DESIRED_LEN - len(target)), 'constant')
source_fft = scipy.fft.fft(source)
target_fft = np.fft.fft(target)
target_conj = np.conjugate(target_fft)
res = source_fft * target_conj
inverse = np.fft.ifft(res)
# yields the spot of greatest similarity
print(np.argmax(inverse))
Using C#, the algorithm doesn't work, because the results of the FFT are different.
I've tried FftSharp, FFTW and Mathnet.Numerics, all delivering the same result for FFT that is different from scipy/numpy (see table below).
using var sourceWav = new WaveFileReader(sourceFile);
var sourceSamples = sourceWav.ReadAsDoubles();
sourceSamples = FftSharp.Pad.ZeroPad(sourceSamples);
var fftSource = FftSharp.Transform.FFT(sourceSamples);
These are the results of FFT in comparison:
| Python (Scipy) | .Net (FftSharp) |
| --------------- | --------------- |
| -0.028 + 0j | -0.028 + 0j |
| 0.027 + 0.071j | 0.074 - 0.016j |
| 0.113 + 0.082j | -0.083 - 0.112j |
| 0.223 + 0.006j | -0.104 + 0.197j |
| 0.255 - 0.145j | 0.292 + 0.026j |
So, as all implementations in C# are the same, has this something to do with scaling/shifting? What do I have to do to transform the C# results into the numpy/scipy result to make the algorithm work?
Thanks in advance!
Edit1:
Yes, as stated above the input signal is read from the same Wav-File (for python via librosa, for C# its NAudio). I've double checked the values. They are the same in both environments.
I've added a image of the first 500 values of the FFT for python (left) nd C# (right). The results are of the same shape, but the right one seems more "dense". What's the problem here?

Related

How to add transitions between clips in moviepy?

My current attempt:
This is my current code:
from moviepy.editor import *
clips = [VideoFileClip('a.mp4'), VideoFileClip('b.mp4'), VideoFileClip('c.mp4')]
transitioned_clips = [demo_clip.crossfadein(2) for demo_clip in clips]
for_delivery = concatenate_videoclips(transitioned_clips)
for_delivery.write_videofile(target_path, fps=clip.fps, bitrate='%dK' % (bitrate), threads=50, verbose=False, logger=None, preset='ultrafast')
I also tried using CompositeVideoClip, but:
It resulted in a completely black video.
Even for the completely black video it took 50 times longer to write the video file than for without transitions.
My current output:
My current output is a video with the 3 videos concatenated (which is good), but no transitions between the clips (which is not good).
My goal:
My goal is to add the crossfadein transition for 2 seconds between the clips and concatenate the clips into one video and output it.
In other words, I want it like (in order from left to right):
| | + | | + | |
| clip 1 | transition 1 | clip 2 | transition 2 | clip 3 |
| | + | | + | |
Is there anyway to have transitions? Any help appreciated.

You could try this approach of manually setting the start time to handle the transitions.
padding = 2
video_clips = [VideoFileClip('a.mp4'), VideoFileClip('b.mp4'), VideoFileClip('c.mp4')]
video_fx_list = [video_clips[0]]
idx = video_clips[0].duration - padding
for video in video_clips[1:]:
video_fx_list.append(video.set_start(idx).crossfadein(padding))
idx += video.duration - padding
final_video = CompositeVideoClip(video_fx_list)
final_video.write_videofile(target_path, fps=clip.fps) # add any remaining params
Edit:
Here's an attempt using concatenate:
custom_padding = 2
final_video = concatenate(
[
clip1,
clip2.crossfadein(custom_padding),
clip3.crossfadein(custom_padding)
],
padding=-custom_padding,
method="chain"
)
final_video.write_videofile(target_path, fps=clip.fps) # add any remaining params

Does it make sense to multithread two different neural networks on one GPU?

Let's say I have two neural networks represented as Python classes A and B. Methods A.run() and B.run() represent a feedforward inference for one image.
As an example, A.run() takes 100 ms, and B.run() takes 50 ms.
When ran one after another, i.e.
img = cap.read()[1] # e.g. cv2.VideoCapture instance
start_time = time.time()
A.run(img) # 100 ms
B.run(img) # 50 ms
time_diff = time.time() - start_time # 100 + 50 = 150 ms
the inference times just add up to 150 ms.
To make this faster, we can try parallelizing so that they start at the same time. An implementation that uses Python's threading is outlined below:
class A:
# This method is spawned using Python's threading library
def run_queue(self, input_queue, output_queue):
while True:
img = input_queue.get()
start_time = time.time()
output = self.run(img)
time_diff = time.time() - start_time # Supposedly 100 ms for class A, and 50 ms for class B
# in main program flow:
# Assume that a_input_queue and a_output_queue are tied to an instance of class A
# And similar for class B
img = cap.read()[1]
a_input_queue.put(img)
b_input_queue.put(img)
start_time = time.time()
a_output = a_output_queue.get() # Should take 100 ms
b_output = b_output_queue.get() # B.run() should take 50 ms, but since it started at the same time as A.run(), this get() should effectively return immediately
time_diff = time.time() - start_time # Should theoretically be 100 ms
So theoretically, we should only be bottlenecked by A, and end up with 100 ms for the whole system.
However, it seems that B.run() takes around 100 ms as well when measured in B.run_queue(). Since they started at around the same time, the whole system also takes around a 100 ms.
Does this make sense? Is trying to thread the two neural networks sensible, if the resulting total inference time is about the same (or possibly incrementally faster at least)?
My guess is that the GPU is maxed at 100% for one neural network, so when trying to inference two networks at the same time, it just rearranges the instructions but can only do the same number of computations anyway:
Illustration:
A.run() executes 8 blocks of instructions:
| X | X | X | X | X | X | X | X |
B.run() executes only 4 blocks of instructions:
| Y | Y | Y | Y |
Now, say that the GPU can process 2 blocks of instructions per second.
So, in the case that A.run() and B.run() are ran one after the other (non-threaded):
| X | X | X | X | X | X | X | X | Y | Y | Y | Y | -> A.run() takes 4 s, B.run() takes 2 s, everything takes 6 s
In the threaded case, the instructions are rearranged so both start at the same time, but get stretched out:
| X | X | Y | X | X | Y | X | X | Y | X | X | Y -> A.run() roughly takes 6 s, B.run() roughly takes 6 s, everything seems to take 6 s
Is the above illustration the case?
Finally, let's consider a class C similar to B (e.g. inference time=50 ms), except that it uses the CPU. Thus, it shouldn't compete with A in GPU usage, but from experiments, it just behaved like B; its inference time seemed to be stretched to match A's.
Thoughts? Thanks in advance.

The sum of an zipfile archive's parts is not equal to its file size

TL;DR - The actual problem is that I am working on something that provides information about the entries in an archive file and specifies 'where' the size in the archive is coming from. The example below is sort of exactly not like my real problem(which has hundreds of thousands of entries) but highlights the actual problem I'm running into. My problem is that there's a non-trivial amount of size in my archive that is unaccounted for (actually used in the overhead for compression is my guess). The sum of the parts of my archive (the total compressed size of all of my entries + the expected gaps between them) is less than the actual size of the archive. How do I inspect the archive in a way that provides insight into this hidden overhead?
Where I'm at:
I have a directory that contains three files:
doc.pdf
cat.jpg
model.stl
Using a freeware program I dump these into a zip file: demo.zip
Using python I can inspect these pretty easily:
info_list= zipfile.ZipFile('demo.zip').infolist()
for i in info_list:
print i.orig_filename
print i.compress_size
print i.header_offset
Using this info we can get some info.
The total size of demo.zip is 84469
The compressed size of:
|---------------------|-----------------|---------------|
| File | Compressed Size | Header Offset |
|---------------------|-----------------|---------------|
| doc.pdf | 21439 | 0 |
|---------------------|-----------------|---------------|
| cat.jpg | 48694 | 21495 |
|---------------------|-----------------|---------------|
| model.stl | 13870 | 70232 |
|---------------------|-----------------|---------------|
I know that zipping will result in some space between entries. (Thus the difference between the sums of previous entry sizes and the header offset for every entry). You can calculate this small 'Gap':
gap = offset - previous_entry_size - previous_entry_offset
I can update my chart to look like:
|---------------------|-----------------|---------------|---------------|
| File | Compressed Size | Header Offset | 'Gap' |
|---------------------|-----------------|---------------|---------------|
| doc.pdf | 21439 | 0 | 0 |
|---------------------|-----------------|---------------|---------------|
| cat.jpg | 48694 | 21495 | 56 |
|---------------------|-----------------|---------------|---------------|
| model.stl | 13870 | 70232 | 43 |
|---------------------|-----------------|---------------|---------------|
Cool. So now one might expect that the size of demo.zip would be equal to the sum of the size of all entries and their gaps. (84102 in the example above).
But that's not the case. So, obviously, zipping requires headers and information about how zipping occurred (and how to unzip). But I'm running into a problem on how to define this or access any more information about it.
I could just take 84469 - 84102 and say ~magic zip overhead~ = 367 bytes. But that seems less than ideal because this number obviously is not magic. Is there a way to inspect the underlying zip data that is taking up this space?

An empty zip file is 22 bytes, containing only the End of Central Directory Record.
In [1]: import zipfile
In [2]: z = zipfile.ZipFile('foo.zip', 'w')
In [3]: z.close()
In [4]: import os
In [5]: os.stat('foo.zip').st_size
Out[5]: 22
If the zip-file is not empty, for every file you have a central directory file header (at least 46 bytes), and a local file header (at least 30 bytes).
The actual headers have a variable length because the given lengths do not include space for the file name which is part of the header.

Python dict: there a speed difference between matching a numeric key over a word in python?

In a Python dict of 50 items would there be any known noticeable speed difference in matching an integer key (2 digits) to find a string value VERSUS matching a string key (5 - 10+ letters) to find an integer value over a large number of loops (100,000+)?
As a minor bonus; is there any benefit to performing an activity like this in MYSQL versus Python if you're able to?

Micro-benchmarking language
features is a useful exercise, but you have to take it with
a grain of salt. It's hard to do
benchmarks in accurate and meaningful ways, and generally
what people care about are total performance, not individual feature
performance.
I find using a "test harness" makes it easier to run
differnet alternatives in a comparable way.
For dictionary lookup, here's an example using the benchmark module from PyPI.
100
randomized runs, setting up dicts of N=50 items
each--either int keys and str values or the
reverse, then trying both the try/excepts
and get access paradigms. Here's the code:
import benchmark
from random import choice, randint
import string
def str_key(length=8, alphabet=string.ascii_letters):
return ''.join(choice(alphabet) for _ in xrange(length))
def int_key(min=10, max=99):
return randint(min, max)
class Benchmark_DictLookup(benchmark.Benchmark):
each = 100 # allows for differing number of runs
def setUp(self):
# Only using setUp in order to subclass later
# Can also specify tearDown, eachSetUp, and eachTearDown
self.size = 1000000
self.n = 50
self.intdict = { int_key():str_key() for _ in xrange(self.n) }
self.strdict = { str_key():int_key() for _ in xrange(self.n) }
self.intkeys = [ int_key() for _ in xrange(self.size) ]
self.strkeys = [ str_key() for _ in xrange(self.size) ]
def test_int_lookup(self):
d = self.intdict
for key in self.intkeys:
try:
d[key]
except KeyError:
pass
def test_int_lookup_get(self):
d = self.intdict
for key in self.intkeys:
d.get(key, None)
def test_str_lookup(self):
d = self.strdict
for key in self.strkeys:
try:
d[key]
except KeyError:
pass
def test_str_lookup_get(self):
d = self.strdict
for key in self.strkeys:
d.get(key, None)
class Benchmark_Hashing(benchmark.Benchmark):
each = 100 # allows for differing number of runs
def setUp(self):
# Only using setUp in order to subclass later
# Can also specify tearDown, eachSetUp, and eachTearDown
self.size = 100000
self.intkeys = [ int_key() for _ in xrange(self.size) ]
self.strkeys = [ str_key() for _ in xrange(self.size) ]
def test_int_hash(self):
for key in self.intkeys:
id(key)
def test_str_hash(self):
for key in self.strkeys:
id(key)
if __name__ == '__main__':
benchmark.main(format="markdown", numberFormat="%.4g")
And the results:
$ python dictspeed.py
Benchmark Report
================
Benchmark DictLookup
--------------------
name | rank | runs | mean | sd | timesBaseline
---------------|------|------|--------|---------|--------------
int lookup get | 1 | 100 | 0.1756 | 0.01619 | 1.0
str lookup get | 2 | 100 | 0.1859 | 0.01477 | 1.05832996073
int lookup | 3 | 100 | 0.5236 | 0.03935 | 2.98143047487
str lookup | 4 | 100 | 0.8168 | 0.04961 | 4.65108861267
Benchmark Hashing
-----------------
name | rank | runs | mean | sd | timesBaseline
---------|------|------|----------|-----------|--------------
int hash | 1 | 100 | 0.008738 | 0.000489 | 1.0
str hash | 2 | 100 | 0.008925 | 0.0002952 | 1.02137781609
Each of the above 600 runs were run in random, non-consecutive order by
`benchmark` v0.1.5 (http://jspi.es/benchmark) with Python 2.7.5
Darwin-13.4.0-x86_64 on 2014-10-28 19:23:01.
Conclusion: String lookup in dictionaries is not that much more expensive than integer lookup. BUT the supposedly Pythonic "ask forgiveness not permission" paradigm takes much longer than simply using the get method call. Also, hashing a string (at least of size 8) is not much more expensive than hashing an integer.
But then things get even more interesting if you run on a different implementation, like PyPy:
$ pypy dictspeed.py
Benchmark Report
================
Benchmark DictLookup
--------------------
name | rank | runs | mean | sd | timesBaseline
---------------|------|------|---------|-----------|--------------
int lookup get | 1 | 100 | 0.01538 | 0.0004682 | 1.0
str lookup get | 2 | 100 | 0.01993 | 0.001117 | 1.295460397
str lookup | 3 | 100 | 0.0203 | 0.001566 | 1.31997704025
int lookup | 4 | 100 | 0.02316 | 0.001056 | 1.50543635375
Benchmark Hashing
-----------------
name | rank | runs | mean | sd | timesBaseline
---------|------|------|-----------|-----------|--------------
str hash | 1 | 100 | 0.0005657 | 0.0001609 | 1.0
int hash | 2 | 100 | 0.006066 | 0.0005283 | 10.724346492
Each of the above 600 runs were run in random, non-consecutive order by
`benchmark` v0.1.5 (http://jspi.es/benchmark) with Python 2.7.8
Darwin-13.4.0-x86_64 on 2014-10-28 19:23:57.
PyPy is about 11x faster, best case, but the ratios are much different. PyPy doesn't suffer the significant exception-handling cost that CPython does. And, hashing an integer is 10x slower than hashing a string. How about that for an unexpected result?
I would have tried Python 3, but benchmark didn't install well there. I also tried increasing the string length to 50. It didn't markedly change the results, the ratios, or the conclusions.
Overall, hashing and lookups are so fast that, unless you have to do them by the millions or billions, or have extraordinarily long keys, or some other unusual circumstance, developers generally needn't be concerned about their micro-performance.

Python Imaging Library loop performance getting slower with repetition

I have written a simple function to resize an image from 1500x2000px to 900x1200px.
def resizeImage(file_list):
if file_list:
if not os.path.exists('resized'):
os.makedirs('resized')
i = 0
for files in file_list:
i += 1
im = Image.open(files)
im = im.resize((900,1200),Image.ANTIALIAS)
im.save('resized/' + files, quality=90)
print str(i) + " files resized successfully"
else:
print "No files to resize"
i used the timeit function to measure how long it takes to run with some example images. Here is an example of the results.
+---------------+-----------+---------------+---------------+---------------+
| Test Name | No. files | Min | Max | Average |
+---------------+-----------+---------------+---------------+---------------+
| Resize normal | 10 | 5.25000018229 | 5.31371171493 | 5.27186083393 |
+---------------+-----------+---------------+---------------+---------------+
But if i repeat the test the times gradually keep increasing i.e.
+---------------+-----------+---------------+---------------+---------------+
| Test Name | No. files | Min | Max | Average |
+---------------+-----------+---------------+---------------+---------------+
| Resize normal | 10 | 5.36660298734 | 5.57177596057 | 5.45903467485 |
+---------------+-----------+---------------+---------------+---------------+
+---------------+-----------+---------------+---------------+---------------+
| Test Name | No. files | Min | Max | Average |
+---------------+-----------+---------------+---------------+---------------+
| Resize normal | 10 | 5.58739076382 | 5.76515489024 | 5.70014196601 |
+---------------+-----------+---------------+---------------+---------------+
+---------------+-----------+---------------+---------------+-------------+
| Test Name | No. files | Min | Max | Average |
+---------------+-----------+---------------+---------------+-------------+
| Resize normal | 10 | 5.77366483042 | 6.00337707034 | 5.891541538 |
+---------------+-----------+---------------+---------------+-------------+
+---------------+-----------+---------------+--------------+---------------+
| Test Name | No. files | Min | Max | Average |
+---------------+-----------+---------------+--------------+---------------+
| Resize normal | 10 | 5.91993466793 | 6.1294756299 | 6.03516199948 |
+---------------+-----------+---------------+--------------+---------------+
This is how im running the test.
def resizeTest(repeats):
os.chdir('C:/Users/dominic/Desktop/resize-test')
files = glob.glob('*.jpg')
t = timeit.Timer(
"resizeImage(filess)",
setup="from imageToolkit import resizeImage; import glob; filess = glob.glob('*.jpg')"
)
time = t.repeat(repeats, 1)
results = {
'name': 'Resize normal',
'files': len(files),
'min': min(time),
'max': max(time),
'average': averageTime(time)
}
resultsTable(results)
I have moved the images that are processed from my mechanical hard drive to the SSD and the issue persists. I have also checked the Memory being used and it stays pretty steady through all the runs, topping out at around 26Mb, the process uses around 12% of one core of the CPU.
Going forward i like to experiment with the multiprocessing library to increase the speed, but i'd like to get to the bottom of this issue first.
Would this be an issue with my loop that causes the performance to degrade?

The im.save() call is slowing things down; repeated writing to the same directory is perhaps thrashing OS disk caches. When you removed the call, the OS was able to optimize the image read access times via disk caches.
If your machine has multiple CPU cores, you can indeed speed up the resize process, as the OS will schedule multiple sub-processes across those cores to run each resize operation. You'll not get a linear performance improvement, as all those processes still have to access the same disk for both reads and writes.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

FFT differs between Python and C# implementations - python

Related

How to add transitions between clips in moviepy?

Does it make sense to multithread two different neural networks on one GPU?

The sum of an zipfile archive's parts is not equal to its file size

Python dict: there a speed difference between matching a numeric key over a word in python?

Python Imaging Library loop performance getting slower with repetition

Categories

Resources