I wrote a function (in Python) that saves an image with a certain quality level (0 to 100, higher is better). Depending on the quality level, the final file is bigger or smaller (in byte).
I need to write a function that selects the best possible quality level for an image, keeping it under a maximum file size (in byte). The only way I can do it is by attempts.
The simplest approach is: save the file with a certain quality and if it is bigger than expected then reduce the quality and so forth.
Unfortunately this approach is very time consuming. Even if I reduce the quality by five points at each iteration, the risk is that I have to save the same file 21 times before I find the right quality.
Another solution is: try with the half of the previous quality and focus on the lower range or on the higher range of quality according to the result.
Let me clarify:
assuming that quality can be between 0 and 100, try with quality = 50
if file size is higher than expected, focus on the lower quality range (e.g. 0-49) otherwise on the higher one (e.g. 51-100).
set the quality value in the middle of the considered range and save the file (e.g. 25 if lower range and 75 if higher range); return to 2
exit when the range is smaller than 5
This second solution requires always 6 iterations.
Here it is a Python implementation:
limit_file_size = 512 # maximum file size, change this for different results
q_max = 100
q_min = 0
quality = q_min + (q_max - q_min) // 2
while True:
file_size = save_file(quality)
if (q_max - q_min) <= 5: break
if file_size > limit_file_size:
q_max = quality
else:
q_min = quality
quality = q_min + (q_max - q_min) // 2
Please note that function save_file is not provided for brevity, a fake implementation of it can be the following:
import math
def save_file(quality):
return int(math.sqrt(quality))*100
How to reduce the amount of cycles required by the above function to converge to a valid solution?
You can try to do a binary search function like you mentioned and save the result in a dictionary so that next iteration it will check if the file size quality was calculated already like that:
# { file_size : quality }
{ 512 : 100 , 256 : 100, 1024 : 27 }
Note that each image has different dimensions, color depth, format etc so you may get a range of results, I suggest to play with it and create sub keys in the properties that have the most impact for example:
{ format : { file_size : quality } }
You could make some kind of simple ML (Machine Learning) approach. Train model that given:
image size for quality 50
limit_file_size
as input will produce best quality you seek, or at least narrow down your search so that you need 2-3 iterations instead of 6.
So you would have to gather training (and validation) data, train the model that will be simple and fast (should be fasterer than save_file).
I think this is best approach in terms of determining this quality as fast as possible, but requires a lot of work (specially if you have no experience in ML).
Related
def SSIM_compute(files,WorkingFolder,DestinationAlikeFolder,DestinationUniqueFolder,results, start_time):
NumberAlike=0
loop=1
while True:
files=os.listdir(WorkingFolder)
if files==[]:
break
IsAlike=False
CountAlike=1
print("Loop : "+str(loop)+" --- starttime : "+str(time.time()-start_time))
for i in range (1,len(files)):
#print("\ti= "+str(i)+" : "+str(time.time()-start_time))
img1=cv2.imread(WorkingFolder+"/"+files[0])
img2=cv2.imread(WorkingFolder+"/"+files[i])
x1,y1=img1.shape[:2]
x2,y2=img2.shape[:2]
x=min(x1,x2)
y=min(y1,y2)
img1=cv2.resize(img1,(x,y),1)
img2=cv2.resize(img2,(x,y),1)
threshold=ssim(img1,img2,multichannel=True)
if threshold>0.8:
IsAlike=True
if os.path.exists((WorkingFolder+"/"+files[i])):
shutil.move((WorkingFolder+"/"+files[i]),DestinationAlikeFolder+"/alike"+str(NumberAlike)+"_"+str(CountAlike)+".jpg")
CountAlike+=1
#results.write("ALIKE : " +files[0] +" --- " +files[i]+"\n")
results.write("ALIKE : /alike"+str(NumberAlike)+"_0"+".jpg --- /alike"+str(NumberAlike)+"_"+str(CountAlike)+".jpg -> "+str(threshold))
if IsAlike:
if os.path.exists((WorkingFolder+"/"+files[0])):
shutil.move((WorkingFolder+"/"+files[0]),DestinationAlikeFolder+"/alike"+str(NumberAlike)+"_0"+".jpg")
NumberAlike+=1
else :
if os.path.exists((WorkingFolder+"/"+files[0])):
shutil.move((WorkingFolder+"/"+files[0]),DestinationUniqueFolder)
loop+=1
I have this code that must compare images to determine if they are identical or if some of them were modified (compression, artefact, etc...).
So, to check if two images are strictly similar, I just compute and compare their respective hashes (in another function not shown here), and to check if they are similar I compute the SSIM on those two files.
The next part is where the trouble begins : when I test this code on a quiet small set of pictures (approx. 50), the execution time is decent, but if I make the set bigger (something like 200 pictures), the execution time becomes way too high (several hours) as expected considering I have two imbricated for loops.
As I'm not very creative, has anybody ideas to reduce the time execution on a larger dataset ? Maybe a method in order to avoid those imbricated loops ?
Thank you for any provided help :)
You're comparing each image with every other image - you could pull reading the first image img1 out, of the for loop and just do it once per file.
But as you're comparing each file with every other file that's going to slowdown as O(N^2/2) i.e. 200 will be 8x slower than 50. Maybe you could resize to a much smaller size like 64x64 which would be much quicker to compare with ssim(), and only if similar at that small size do a full-size comparison?
I have a .wav file that has 2 types of sounds: Long and short. What I need to do is I need to encode them as bits and write them to a binary file.
I got the code from this SO answer: https://stackoverflow.com/a/53309191/2588339 and using it I get this plot for my input wav file:
As you can see, there are shorter and wider parts in the first plot as for the shorter and longer sounds in my file.
My question is how can I encode each one of the sounds as a bit? Like having each long sound in the file represent a 1 and a short sound represent a 0.
EDIT: The 2 types of sound differ by how long they play and by frequency also. The longer sound is also lower frequency and the shorter sound is also higher frequency. You can find a sample of the file here: https://vocaroo.com/i/s0A1weOF3I3f
Measuring the loudness of each frequency by taking the FFT of the signal is the more "scientific" way to do it, but the image of the raw signal indicates it should be possible to get away much easier than that.
If you take a sliding window (at least as wide as 1 period of the primary frequency of the sound (~300Hz)) and find the maximum value within that window, it should be fairly easy to apply a threshold to determine if the tone is playing at a given time interval or not. Here's a quick article I found on rolling window functions.
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
window_size = sample_rate / primary_freq #minimum size window. could be larger.
rolling_max = np.max(rolling_window(wav_data, window_size),-1)
threshold_max = rolling_max > threshold # maybe about 1000ish based on your graph
Then simply determine the length of the runs of True in threshold_max. Again, I'll pull on the community from this answer showing a concise way to get the run length of an array (or other iterable).
def runs_of_ones(bits):
for bit, group in itertools.groupby(bits):
if bit: yield sum(group)
run_lengths = list(runs_of_ones(threshold_max))
The values in run_lengths should now be the length of each "on" pulse of sound in # of samples. It should now be relatively straightforward for you to test each value if it's long or short and write to a file.
Following is a small snippet from the full code
I am trying to understand the logical process of this methodology of split.
SHA1 encoding is 40 characters in hexadecimal. What kind of probability has been computed in the expression ?
What is the reason for (MAX_NUM_IMAGES_PER_CLASS + 1) ? Why add 1 ?
Does setting different values to MAX_NUM_IMAGES_PER_CLASS have an effect on the split quality ?
How good a quality of split would we get out of this ? Is this is a recommended way of splitting datasets ?
# We want to ignore anything after '_nohash_' in the file name when
# deciding which set to put an image in, the data set creator has a way of
# grouping photos that are close variations of each other. For example
# this is used in the plant disease data set to group multiple pictures of
# the same leaf.
hash_name = re.sub(r'_nohash_.*$', '', file_name)
# This looks a bit magical, but we need to decide whether this file should
# go into the training, testing, or validation sets, and we want to keep
# existing files in the same set even if more files are subsequently
# added.
# To do that, we need a stable way of deciding based on just the file name
# itself, so we do a hash of that and then use that to generate a
# probability value that we use to assign it.
hash_name_hashed = hashlib.sha1(compat.as_bytes(hash_name)).hexdigest()
percentage_hash = ((int(hash_name_hashed, 16) %
(MAX_NUM_IMAGES_PER_CLASS + 1)) *
(100.0 / MAX_NUM_IMAGES_PER_CLASS))
if percentage_hash < validation_percentage:
validation_images.append(base_name)
elif percentage_hash < (testing_percentage + validation_percentage):
testing_images.append(base_name)
else:
training_images.append(base_name)
result[label_name] = {
'dir': dir_name,
'training': training_images,
'testing': testing_images,
'validation': validation_images,
}
This code is simply distributing file names “randomly” (but reproducibly) over a number of bins and then grouping the bins into just the three categories. The number of bits in the hash is irrelevant (so long as it’s “enough”, which is probably about 35 for this sort of work).
Reducing modulo n+1 produces a value on [0,n], and multiplying that by 100/n obviously produces a value on [0,100], which is being interpreted as a percentage. n being MAX_NUM_IMAGES_PER_CLASS is meant to control the rounding error in the interpretation to be no more than “one image”.
This strategy is reasonable, but looks a bit more sophisticated than it is (since there is still rounding going on, and the remainder introduces a bias—although with numbers this large it is utterly unobservable). You could make it simpler and more accurate by simply precalculating ranges over the whole space of 2^160 hashes for each class and just checking the hash against the two boundaries. That still notionally involves rounding, but with 160 bits it’s only that intrinsic to representing decimals like 31% in floating point.
I am new to programming, particularly to python. I am trying to implement an auditory model using 4th order gammatone filters. I need to break down a signal into 39 channels. When I used a smaller signal (about 884726 bits), the code runs well but I think the buffers are full, so I have to restart the shell to run the code second time. Tried using flush() but didn't work out.
So, I decided to window the signal using a Hanning window but couldn't succeed in it either. To be very clear, I need to break an audio signal into 39 channels, rectify it (half wave) and then pass it into a second bank of 4th order filters, this time into 10 channels. I am trying to downsample the signal before sending into the second bank of filters. This is the piece of code that implements the filter bank by using the coefficients generated by another function. The dimensions of b are 39x4096.
def filterbank_application(input, b, verbose = False):
"""
A function to run the input through a bandpass filter bank with parameters defined by the b and a coefficients.
Parameters:
* input (type: array-like matrix of floats) - input signal. (Required)
* b (type: array-like matrix of floats) - the b coefficients of each filter in shape b[numOfFilters][numOfCoeffs]. (Required)
Returns:
* y (type: numpy array of floats) - an array with inner dimensions equal to that of the input and outer dimension equal to
the length of fc (i.e. the number of bandpass filters in the bank) containing the outputs to each filter. The output
signal of the nth filter can be accessed using y[n].
"""
input = np.array(input)
bshape = np.shape(b)
nFilters = bshape[0]
lengthFilter = bshape[1]
shape = (nFilters,) + (np.shape(input))
shape = np.array(shape[0:])
shape[-1] = shape[-1] + lengthFilter -1
y = np.zeros((shape))
for i in range(nFilters):
if(verbose):
sys.stdout.write("\r" + str(int(np.round(100.0*i/nFilters))) + "% complete.")
sys.stdout.flush()
x = np.array(input)
y[i] = signal.fftconvolve(x,b[i])
if(verbose): sys.stdout.write("\n")
return y
samplefreq,input = wavfile.read('sine_sweep.wav')
input = input.transpose()
input = (input[0] + input[1])/2
b_coeff1 = gammatone_filterbank(samplefreq, 39)
Output = filterbank_application(input, b_coeff1)
Rect_Output = half_rectification(Output)
I want to window audio into chunks of 20 seconds length. I would appreciate if you could let me know an efficient way of windowing my signal as the whole audio will be 6 times bigger than the signal I am using. Thanks in advance.
You may have a problem with the memory consumption, if you run a 32-bit Python. Your code consumes approximately 320 octets (bytes) per sample (40 buffers, 8 octets per sample). The maximum memory available is 2 GB, which means that then the absolute maximum size for the signal is around 6 million samples. If your file around 100 seconds, then you may start having problems.
There are two ways out of that problem (if that really is the problem, but I cannot see any evident reason why your code would otherwise crash). Either get a 64-bit Python or rewrite your code to use memory in a more practical way.
If I have understood your problem correctly, you want to:
run the signal through 39 FIR filters (4096 points each)
half-rectify the resulting signals
downsample the resulting half-rectified signal
filter each of the downsampled rectified signals by 10 FIR filters (or IIR?)
This wil give you 39 x 10 signals which give you the attack and frequency response of the incoming auditory signal.
How I would do this is:
take the original signal and keep it in memory (if it does not fit, that can be fixed by a trick called memmap, but if your signal is not very long it will fit)
take the first gammatone filter and run the convolution (scipy.signal.fftconvolve)
run the half-wave rectification (sig = np.clip(sig, 0, None, out=sig))
downsample the signal (e.g. scipy.signal.decimate)
run the 10 filters (e.g. scipy.signal.fftconvolve)
repeat steps 2-5 for all other gammatones
This way you do not need to keep the 39 copies of the filtered signal in memory, if you only need the end results.
Without seeing the complete application and knowing more about the environment it is difficult to say whether you really have a memory problem.
Just a stupid signal-processing question: Why half-wave rectification? Why not full-wave rectification: sig = np.abs(sig)? The low-pass filtering is easier with full-wave rectified signal, and the audio signals should anyway be rather symmetric.
There are a few things which you might want to change in your code:
you convert input into an array as the first thing in your function - there is no need to do it again within the loop (just use input instead of x when running the fftconvolve)
creating an empty ycould be done by y = np.empty((b.shape[0], input.shape[0] + b.shape[1] - 1)); this will be more readable and gets rid of a number of unnecessary variables
input.transpose() takes some time and memory and is not required. You may instead do: input = np.average(input, axis=1) This will average every row in the array (i.e. average the channels).
There is nothing wrong with your sys.stdout.write, etc. There the flush is used because otherwise the text is written into a buffer and only shown on the screen when the buffer is full.
I am solving the homework-1 of Caltech Machine Learning Course (http://work.caltech.edu/homework/hw1.pdf) . To solve ques 7-10 we need to implement a PLA. This is my implementation in python:
import sys,math,random
w=[] # stores the weights
data=[] # stores the vector X(x1,x2,...)
output=[] # stores the output(y)
# returns 1 if dot product is more than 0
def sign_dot_product(x):
global w
dot=sum([w[i]*x[i] for i in xrange(len(w))])
if(dot>0):
return 1
else :
return -1
# checks if a point is misclassified
def is_misclassified(rand_p):
return (True if sign_dot_product(data[rand_p])!=output[rand_p] else False)
# loads data in the following format:
# x1 x2 ... y
# In the present case for d=2
# x1 x2 y
def load_data():
f=open("data.dat","r")
global w
for line in f:
data_tmp=([1]+[float(x) for x in line.split(" ")])
data.append(data_tmp[0:-1])
output.append(data_tmp[-1])
def train():
global w
w=[ random.uniform(-1,1) for i in xrange(len(data[0]))] # initializes w with random weights
iter=1
while True:
rand_p=random.randint(0,len(output)-1) # randomly picks a point
check=[0]*len(output) # check is a list. The ith location is 1 if the ith point is correctly classified
while not is_misclassified(rand_p):
check[rand_p]=1
rand_p=random.randint(0,len(output)-1)
if sum(check)==len(output):
print "All points successfully satisfied in ",iter-1," iterations"
print iter-1,w,data[rand_p]
return iter-1
sign=output[rand_p]
w=[w[i]+sign*data[rand_p][i] for i in xrange(len(w))] # changing weights
if iter>1000000:
print "greater than 1000"
print w
return 10000000
iter+=1
load_data()
def simulate():
#tot_iter=train()
tot_iter=sum([train() for x in xrange(100)])
print float(tot_iter)/100
simulate()
The problem according to the answer of question 7 it should take around 15 iterations for perceptron to converge when size of training set but the my implementation takes a average of 50000 iteration . The training data is to be randomly generated but I am generating data for simple lines such as x=4,y=2,..etc. Is this the reason why I am getting wrong answer or there is something else wrong. Sample of my training data(separable using y=2):
1 2.1 1
231 100 1
-232 1.9 -1
23 232 1
12 -23 -1
10000 1.9 -1
-1000 2.4 1
100 -100 -1
45 73 1
-34 1.5 -1
It is in the format x1 x2 output(y)
It is clear that you are doing a great job learning both Python and classification algorithms with your effort.
However, because of some of the stylistic inefficiencies with your code, it makes it difficult to help you and it creates a chance that part of the problem could be a miscommunication between you and the professor.
For example, does the professor wish for you to use the Perceptron in "online mode" or "offline mode"? In "online mode" you should move sequentially through the data point and you should not revisit any points. From the assignment's conjecture that it should require only 15 iterations to converge, I am curious if this implies the first 15 data points, in sequential order, would result in a classifier that linearly separates your data set.
By instead sampling randomly with replacement, you might be causing yourself to take much longer (although, depending on the distribution and size of the data sample, this is admittedly unlikely since you'd expect roughly that any 15 points would do about as well as the first 15).
The other issue is that after you detect a correctly classified point (cases when not is_misclassified evaluates to True) if you then witness a new random point that is misclassified, then your code will kick down into the larger section of the outer while loop, and then go back to the top where it will overwrite the check vector with all 0s.
This means that the only way your code will detect that it has correctly classified all the points is if the particular random sequence that it evaluates them (in the inner while loop) happens to be a string of all 1's except for the miraculous ability that on any particular 0, on that pass through the array, it classifies correctly.
I can't quite formalize why I think that will make the program take much longer, but it seems like your code is requiring a much stricter form of convergence, where it sort of has to learn everything all at once on one monolithic pass way late in the training stage after having been updated a bunch already.
One easy way to check if my intuition about this is crappy would be to move the line check=[0]*len(output) outside of the while loop all together and only initialize it one time.
Some general advice to make the code easier to manage:
Don't use global variables. Instead, let your function to load and prep the data return things.
There are a few places where you say, for example,
return (True if sign_dot_product(data[rand_p])!=output[rand_p] else False)
This kind of thing can be simplified to
return sign_dot_product(data[rand_p]) != output[rand_p]
which is easier to read and conveys what criteria you're trying to check for in a more direct manner.
I doubt efficiency plays an important role since this seems to be a pedagogical exercise, but there are a number of ways to refactor your use of list comprehensions that might be beneficial. And if possible, just use NumPy which has native array types. Witnessing how some of these operations have to be expressed with list operations is lamentable. Even if your professor doesn't want you to implement with NumPy because she or he is trying to teach you pure fundamentals, I say just ignore them and go learn NumPy. It will help you with jobs, internships, and practical skill with these kinds of manipulations in Python vastly more than fighting with the native data types to do something they were not designed for (array computing).