I have some questions regarding the Perlin noise and the pv.sample_function in general.
How would you go about applying Perlin noise to a sphere? I would like to have a little bit disformed sphere.
Can you apply Perlin noise to a mesh (sphere/plane) multiple times? I would like to have a plane with some rough 'waves' and high detailed noise on top of them (thus having big waves with little waves in them).
What exactly does the third parameter in the frequency do? After playing around with some values I haven't noticed how it affected the noise.
These are the two different frequencies/Perlin noises that I would like to apply to one plane. Additionally, it shows the plane they respectively create.
def smooth_and_plot(sampled : pv.core.grid.UniformGrid):
mesh = sampled.warp_by_scalar('scalars')
mesh = mesh.extract_surface()
# clean and smooth a little to reduce perlin noise artifacts
mesh = mesh.smooth(n_iter=100, inplace=True, relaxation_factor=0.07)
mesh.plot()
def gravel_plane():
freq = [180, 180, 50]
noise = pv.perlin_noise(0.2, freq, (0, 0, 0))
sampled = pv.sample_function(noise,
bounds=(-10, 2, -10, 10, -10, 10),
dim=(500, 500, 1))
smooth_and_plot(sampled)
def bumpy_plane():
freq = [0.5, 0.7, 0]
noise = pv.perlin_noise(0.5, freq, (-10, -10, -10))
sampled = pv.sample_function(noise,
bounds=(-10, 2, -10, 10, -10, 10),
dim=(500, 500, 1))
smooth_and_plot(sampled)
Let me answer your questions in reverse order for didactical reasons.
What exactly does the third parameter in the frequency do? After playing around with some values I haven't noticed how it affected the noise.
You didn't see an effect because you were looking at 2d samples, and changing the behaviour along the third axis. The three frequencies specify the granularity of the noise along the x, y and z axes, respectively. In other words, the generated implicit function is a scalar function of three variables. It's just that your sampling reduces the dimensionality to 2.
Frequency might be a surprising quantity when it comes to spatial quantities, but it works the same way as for time. High temporal frequency means short oscillation period, low temporal frequency means long oscillation period. High spatial frequency means short wavelength, low spatial frequency means long wavelength. To be specific, wavelength and frequency are inversely proportional.
So you'll see the effect of the third frequency when you start slicing along the z axis:
import pyvista as pv
freq = [0.5, 0.5, 2]
noise = pv.perlin_noise(0.5, freq, (0, 0, 0))
noise_cube = pv.sample_function(noise,
bounds=(-10, 10, -10, 10, -10, 10),
dim=(200, 200, 200))
noise_cube.slice_orthogonal(-9, -9, -9).plot()
As you can see, the blobs in the xy plane are circular, because the two in-plane frequencies are equal. But in both vertical planes the blobs are elongated: they are flatter in the z direction. This is because the frequency along the z axis is four times larger, leading to a wavelength that is four times smaller. This will lead to random blobs having a roughly 4:1 aspect ratio.
Can you apply Perlin noise to a mesh (sphere/plane) multiple times? I would like to have a plane with some rough 'waves' and high detailed noise on top of them (thus having big waves with little waves in them).
All that happens in your snippets is that a function is sampled on a pre-defined rectangular grid, and the resulting values are stored as scalars on the grid. If you want to superimpose two functions, all you have to do is sum up the scalars from two such function calls. This will be somewhat wasteful, as you are generating the same grid twice (and discarding one of the copies), but this is the least exhausting solution from a development point of view:
def bumpy_gravel_plane():
bounds = (-10, 2, -10, 10, -10, 10)
dim = (500, 500, 1)
freq = [180, 180, 50]
noise = pv.perlin_noise(0.2, freq, (0, 0, 0))
sampled_gravel = pv.sample_function(noise, bounds=bounds, dim=dim)
freq = [0.5, 0.7, 0]
noise = pv.perlin_noise(0.5, freq, (-10, -10, -10))
sampled_bumps = pv.sample_function(noise, bounds=bounds, dim=dim)
sampled = sampled_gravel
sampled['scalars'] += sampled_bumps['scalars']
smooth_and_plot(sampled)
How would you go about applying Perlin noise to a sphere? I would like to have a little bit disformed sphere.
The usual solution of generating a 2d texture and applying that to a sphere won't work here, because the noise is not periodic, so you can't easily close it like that. But if you think about it, the generated Perlin noise is a 3d function. You can just sample this 3d function directly on your sphere!
There's one small problem: I don't think you can do that with just pyvista. We'll have to get our hands slightly dirty, and by that I mean using a bare vtk method (namely EvaluateFunction() of the noise). Generate your sphere, and then query the noise function of your choice on its points. If you want the result to look symmetric, you'll have to set the same frequency along all three Cartesian axes:
def bumpy_sphere(R=10):
freq = [0.5, 0.5, 0.5]
noise = pv.perlin_noise(0.5, freq, (0, 0, 0))
sampled = pv.Sphere(radius=R, phi_resolution=100, theta_resolution=100)
# query the noise at each point manually
sampled['scalars'] = [noise.EvaluateFunction(point) for point in sampled.points]
smooth_and_plot(sampled)
I am going through these two librosa docs: melspectrogram and stft.
I am working on datasets of audio of variable lengths, but I don't quite get the shapes. For example:
(waveform, sample_rate) = librosa.load('audio_file')
spectrogram = librosa.feature.melspectrogram(y=waveform, sr=sample_rate)
dur = librosa.get_duration(waveform)
spectrogram = torch.from_numpy(spectrogram)
print(spectrogram.shape)
print(sample_rate)
print(dur)
Output:
torch.Size([128, 150])
22050
3.48
What I get are the following points:
Sample rate is that you get N samples each second, in this case 22050 samples each second.
The window length is the FFT calculated for that period of length of the audio.
STFT is calculation os FFT in small windows of time of audio.
The shape of the output is (n_mels, t). t = duration/window_of_fft.
I am trying to understand or calculate:
What is n_fft? I mean what exactly is it doing to the audio wave? I read in the documentation the following:
n_fft : int > 0 [scalar]
length of the windowed signal after padding with zeros. The number of rows in the STFT matrix D is (1 + n_fft/2). The default value,
n_fft=2048 samples, corresponds to a physical duration of 93
milliseconds at a sample rate of 22050 Hz, i.e. the default sample
rate in librosa.
This means that in each window 2048 samples are taken which means that --> 1/22050 * 2048 = 93[ms]. FFT is being calculated for every 93[ms] of the audio?
So, this means that the window size and window is for filtering the signal in this frame?
In the example above, I understand I am getting 128 number of Mel spectrograms but what exactly does that mean?
And what is hop_length? Reading the docs, I understand that it is how to shift the window from one fft window to the next right? If this value is 512 and n_fft = also 512, what does that mean? Does this mean that it will take a window of 23[ms], calculate FFT for this window and skip the next 23[ms]?
How can I specify that I want to overlap from one FFT window to another?
Please help, I have watched many videos of calculating spectrograms but I just can't seem to see it in real life.
The essential parameter to understanding the output dimensions of spectrograms is not necessarily the length of the used FFT (n_fft), but the distance between consecutive FFTs, i.e., the hop_length.
When computing an STFT, you compute the FFT for a number of short segments. These segments have the length n_fft. Usually these segments overlap (in order to avoid information loss), so the distance between two segments is often not n_fft, but something like n_fft/2. The name for this distance is hop_length. It is also defined in samples.
So when you have 1000 audio samples, and the hop_length is 100, you get 10 features frames (note that, if n_fft is greater than hop_length, you may need to pad).
In your example, you are using the default hop_length of 512. So for audio sampled at 22050 Hz, you get a feature frame rate of
frame_rate = sample_rate/hop_length = 22050 Hz/512 = 43 Hz
Again, padding may change this a little.
So for 10s of audio at 22050 Hz, you get a spectrogram array with the dimensions (128, 430), where 128 is the number of Mel bins and 430 the number of features (in this case, Mel spectra).
I have multiple 2D numpy arrays (image data of a bright object, each of size 600x600), and I ran a cross-correlation on each of the individual images vs. a stacked composite image using skimage.feature.register_translation to obtain the relative subpixel shifts of each image's centroid with respect to the centroid of the composite image. I'd now like to create a weighted 2d histogram of all my individual image data, using the relative shifts of each in order to have all of them exactly centered. But I'm confused on how to do this. My code so far is below (after finding the shifts):
import numpy as np
data = #individual image data; this is an array of multiple 2D (600x600) arrays
# Shifts in x and y (each are same length as 'data')
dx = np.array([0.346, 0.23, 0.113, ...])
dy = np.array([-0.416, -0.298, 0.275, ...])
# Bins
bins = np.arange(-300, 300, 1)
# Weighted histogram
h, xe, ye = np.histogram2d(dx.ravel(), dy.ravel(), bins=bins, weights=data.ravel())
This isn't getting me anywhere though -- I think my weights parameters is wrong (I think there should be just one weight per image, instead of the whole image?), but don't know what else I would put for it. The images are of different bright sources, so I can't just assume they all have the same widths either. How can I accomplish this?
I am trying to zero-center and whiten CIFAR10 dataset, but the result I get looks like random noise!
Cifar10 dataset contains 60,000 color images of size 32x32. The training set contains 50,000 and test set contains 10,000 images respectively.
The following snippets of code show the process I did to get the dataset whitened :
# zero-center
mean = np.mean(data_train, axis = (0,2,3))
for i in range(data_train.shape[0]):
for j in range(data_train.shape[1]):
data_train[i,j,:,:] -= mean[j]
first_dim = data_train.shape[0] #50,000
second_dim = data_train.shape[1] * data_train.shape[2] * data_train.shape[3] # 3*32*32
shape = (first_dim, second_dim) # (50000, 3072)
# compute the covariance matrix
cov = np.dot(data_train.reshape(shape).T, data_train.reshape(shape)) / data_train.shape[0]
# compute the SVD factorization of the data covariance matrix
U,S,V = np.linalg.svd(cov)
print 'cov.shape = ',cov.shape
print U.shape, S.shape, V.shape
Xrot = np.dot(data_train.reshape(shape), U) # decorrelate the data
Xwhite = Xrot / np.sqrt(S + 1e-5)
print Xwhite.shape
data_whitened = Xwhite.reshape(-1,32,32,3)
print data_whitened.shape
outputs:
cov.shape = (3072L, 3072L)
(3072L, 3072L) (3072L,) (3072L, 3072L)
(50000L, 3072L)
(50000L, 32L, 32L, 3L)
(32L, 32L, 3L)
and trying to show the resulting image :
import matplotlib.pyplot as plt
%matplotlib inline
from scipy.misc import imshow
print data_whitened[0].shape
fig = plt.figure()
plt.subplot(221)
plt.imshow(data_whitened[0])
plt.subplot(222)
plt.imshow(data_whitened[100])
plt.show()
By the way the data_train[0].shape is (3,32,32),
but if I reshape the whittened image according to that I get
TypeError: Invalid dimensions for image data
Could this be a visualization issue only? if so how can I make sure thats the case?
Update :
Thanks to #AndrasDeak, I fixed the visualization code this way, but still the output looks random :
data_whitened = Xwhite.reshape(-1,3,32,32).transpose(0,2,3,1)
print data_whitened.shape
fig = plt.figure()
plt.subplot(221)
plt.imshow(data_whitened[0])
Update 2:
This is what I get when I run some of the commands given below :
As it can be seen below, toimage can show the image just fine, but trying to reshape it, messes up the image.
# output is of shape (N, 3, 32, 32)
X = X.reshape((-1,3,32,32))
# output is of shape (N, 32, 32, 3)
X = X.transpose(0,2,3,1)
# put data back into a design matrix (N, 3072)
X = X.reshape(-1, 3072)
plt.imshow(X[6].reshape(32,32,3))
plt.show()
for some wierd reason, this was what I got at first , but then after several tries, it changed to the previous image.
Let's walk through this. As you point out, CIFAR contains images which are stored in a matrix; each image is a row, and each row has 3072 columns of uint8 numbers (0-255). Images are 32x32 pixels and pixels are RGB (three channel colour).
# https://www.cs.toronto.edu/~kriz/cifar.html
# wget https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
# tar xf cifar-10-python.tar.gz
import numpy as np
import cPickle
with open('cifar-10-batches-py/data_batch_1') as input_file:
X = cPickle.load(input_file)
X = X['data'] # shape is (N, 3072)
It turns out that the columns are ordered a bit funny: all the red pixel values come first, then all the green pixels, then all the blue pixels. This makes it tricky to have a look at the images. This:
import matplotlib.pyplot as plt
plt.imshow(X[6].reshape(32,32,3))
plt.show()
gives this:
So, just for ease of viewing, let's shuffle the dimensions of our matrix around with reshape and transpose:
# output is of shape (N, 3, 32, 32)
X = X.reshape((-1,3,32,32))
# output is of shape (N, 32, 32, 3)
X = X.transpose(0,2,3,1)
# put data back into a design matrix (N, 3072)
X = X.reshape(-1, 3072)
Now:
plt.imshow(X[6].reshape(32,32,3))
plt.show()
gives:
OK, on to ZCA whitening. We're frequently reminded that it's super important to zero-center the data before whitening it. At this point, an observation about the code you include. From what I can tell, computer vision views color channels as just another feature dimension; there's nothing special about the separate RGB values in an image, just like there's nothing special about the separate pixel values. They're all just numeric features. So, whereas you're computing the average pixel value, respecting colour channels (i.e., your mean is a tuple of r,g,b values), we'll just compute the average image value. Note that X is a big matrix with N rows and 3072 columns. We'll treat every column as being "the same kind of thing" as every other column.
# zero-centre the data (this calculates the mean separately across
# pixels and colour channels)
X = X - X.mean(axis=0)
At this point, let's also do Global Contrast Normalization, which is quite often applied to image data. I'll use the L2 norm, which makes every image have vector magnitude 1:
X = X / np.sqrt((X ** 2).sum(axis=1))[:,None]
One could easily use something else, like the standard deviation (X = X / np.std(X, axis=0)) or min-max scaling to some interval like [-1,1].
Nearly there. At this point, we haven't greatly modified our data, since we've just shifted and scaled it (a linear transform). To display it, we need to get image data back into the range [0,1], so let's use a helper function:
def show(i):
i = i.reshape((32,32,3))
m,M = i.min(), i.max()
plt.imshow((i - m) / (M - m))
plt.show()
show(X[6])
The peacock looks slightly brighter here, but that's just because we've stretched its pixel values to fill the interval [0,1]:
ZCA whitening:
# compute the covariance of the image data
cov = np.cov(X, rowvar=True) # cov is (N, N)
# singular value decomposition
U,S,V = np.linalg.svd(cov) # U is (N, N), S is (N,)
# build the ZCA matrix
epsilon = 1e-5
zca_matrix = np.dot(U, np.dot(np.diag(1.0/np.sqrt(S + epsilon)), U.T))
# transform the image data zca_matrix is (N,N)
zca = np.dot(zca_matrix, X) # zca is (N, 3072)
Taking a look (show(zca[6])):
Now the peacock definitely looks different. You can see that the ZCA has rotated the image through colour space, so it looks like a picture on an old TV with the Tone setting out of whack. Still recognisable, though.
Presumably because of the epsilon value I used, the covariance of my transformed data isn't exactly identity, but it's fairly close:
>>> (np.cov(zca, rowvar=True).argmax(axis=1) == np.arange(zca.shape[0])).all()
True
Update 29 January
I'm not entirely sure how to sort out the issues you're having; your trouble seems to lie in the shape of your raw data at the moment, so I would advise you to sort that out first before you try to move on to zero-centring and ZCA.
One the one hand, the first plot of the four plots in your update looks good, suggesting that you've loaded up the CIFAR data in the correct way. The second plot is produced by toimage, I think, which will automagically figure out which dimension has the colour data, which is a nice trick. On the other hand, the stuff that comes after that looks weird, so it seems something is going wrong somewhere. I confess I can't quite follow the state of your script, because I suspect you're working interactively (notebook), retrying things when they don't work (more on this in a second), and that you're using code that you haven't shown in your question. In particular, I'm not sure how you're loading the CIFAR data; your screenshot shows output from some print statements (Reading training data..., etc.), and then when you copy train_data into X and print the shape of X, the shape has already been reshaped into (N, 3, 32, 32). Like I say, Update plot 1 would tend to suggest that the reshape has happened correctly. From plots 3 and 4, I think you're getting mixed up about matrix dimensions somewhere, so I'm not sure how you're doing the reshape and transpose.
Note that it's important to be careful with the reshape and transpose, for the following reason. The X = X.reshape(...) and X = X.transpose(...) code is modifying the matrix in place. If you do this multiple times (like by accident in the jupyter notebook), you will shuffle the axes of your matrix over and over, and plotting the data will start to look really weird. This image shows the progression, as we iterate the reshape and transpose operations:
This progression does not cycle back, or at least, it doesn't cycle quickly. Because of periodic regularities in the data (like the 32-pixel row structure of the images), you tend to get banding in these improperly reshape-transposed images. I'm wondering if that's what's going on in the third of your four plots in your update, which looks a lot less random than the images in the original version of your question.
The fourth plot of your update is a colour negative of the peacock. I'm not sure how you're getting that, but I can reproduce your output with:
plt.imshow(255 - X[6].reshape(32,32,3))
plt.show()
which gives:
One way you could get this is if you were using my show helper function, and you mixed up m and M, like this:
def show(i):
i = i.reshape((32,32,3))
m,M = i.min(), i.max()
plt.imshow((i - M) / (m - M)) # this will produce a negative img
plt.show()
I had the same issue: the resulting projected values are off:
A float image is supposed to be in [0-1.0] values for each
def toimage(data):
min_ = np.min(data)
max_ = np.max(data)
return (data-min_)/(max_ - min_)
NOTICE: use this function only for visualization!
However notice how the "decorrelation" or "whitening" matrix is computed #wildwilhelm
zca_matrix = np.dot(U, np.dot(np.diag(1.0/np.sqrt(S + epsilon)), U.T))
This is because the U matrix of eigen vectors of the correlation matrix it's actually this one: SVD(X) = U,S,V but U is the EigenBase of X*X not of X https://en.wikipedia.org/wiki/Singular-value_decomposition
As a final note, I would rather consider statistical units only the pixels and the RGB channels their modalities instead of Images as statistical units and pixels as modalities.
I've tryed this on the CIFAR 10 database and it works quite nicely.
IMAGE EXAMPLE: Top image has RGB values "withened", Bottom is the original
IMAGE EXAMPLE2: NO ZCA transform performances in train and loss
IMAGE EXAMPLE3: ZCA transform performances in train and loss
If you want to linearly scale the image to have zero mean and unit norm you can do the same image whitening as Tensofrlow's tf.image.per_image_standardization
. After the documentation you need to use the following formula to normalize each image independently:
(image - image_mean) / max(image_stddev, 1.0/sqrt(image_num_elements))
Keep in mind that the mean and the standard deviation should be computed over all values in the image. This means that we don't need to specify the axis/axes along which they are computed.
The way to implement that without Tensorflow is by using numpy as following:
import math
import numpy as np
from PIL import Image
# open image
image = Image.open("your_image.jpg")
image = np.array(image)
# standardize image
mean = image.mean()
stddev = image.std()
adjusted_stddev = max(stddev, 1.0/math.sqrt(image.size))
standardized_image = (image - mean) / adjusted_stddev