My goal is to identify true 4K images from images that have been upscaled from a lower resolution with methods such as Bicubic, Billinear, Lanczos, EDSR and so on.
I have a dataset of 200 4K images that I have downscaled to 1080p and back to 4K with Lanczos and Bicubic interpolation.
My idea is to use the BRISQUE features of these images to identify if the images have been upscaled or not.
This is what I currently have:
I used the python implementation of BRISQUE to get the BRISQUE features of 3 same images and scaled them;
import brisque
brisq = BRISQUE()
features = brisq.get_feature('path_to_img')
features = brisq._scale_feature(features)
# This gives me a vector of [36,], some examples shown in code block below
2 of them have been upscaled from a resolution of 1080p as labelled. From the density plot, we can clearly see that there is a difference between true 4K images and upscaled images.
My question is; Now what? I have these features, and we can see the distinction from the density plot, but I have not been able to find a way to classify these images. I am open to any alternative methods that I can use. I will greatly appreciate any help/suggestions!
In the actual BRISQUE implementation, they have used a Support Vector Regression model to return a quality score of [0,100]. However, what I want to be able to do is for a model to be able to classify the image as upscaled/original based on these features; a vector of [36,]
What I've tried:
I tried using a Support Vector Classifier to classify these images, but I must have done something wrong as all images were classified as Upscaled. My guess is that it is due to the features being too close to one another and not dense in different areas for upscaled/original. So it seems like SVC is definitely not the way to go.
I tried adding the abs() of all the features, hoping that this would allow me to have a scatter plot of 2 dense clusters, but the features are still scattered around.
clf = svm.SVC(gamma='auto')
X = np.asarray(X)
X = X.squeeze()
y = np.asarray(y)
clf = clf.fit(X,y.ravel())
print(X)
# These are the scaled feature vectors of [36,] y contains my categories Upscaled/Original
[[-0.65742082 -0.60651108 -0.38406828 ... 0.13391201 -0.77064699
-0.66440166]
[-0.67936245 -0.66312799 -0.40825036 ... -0.04571298 -0.75259527
-0.72149044]
[-0.6176775 -0.3162819 -0.25604552 ... -0.08188693 -0.22914459
-0.04314284]
...
[-0.58745601 -0.65824511 -0.31152205 ... 0.53725558 -0.73736713
-0.40638184]
[-0.65079694 -0.84827717 -0.41251778 ... 0.40268912 -0.94145548
-0.83813568]
[-0.64831298 -0.74385767 -0.41820768 ... 0.38536 -0.83109257
-0.6435719 ]]
Related
I am unsure if this kind of question (related to PCA) is acceptable here or not.
However, it is suggested to do MEAN CENTER before PCA, as known. In fact, I have 2 different classes (Each different class has different participants.). My aim is to distinguish and classify those 2 classes. Still, I am not sure about MEAN CENTER that should be applied to the whole data set, or to each class.
Is it better to make it separately? (if it is, should PREPROCESSING STEPS also be separately as well?) or does it not make any sense?
PCA is just a rotation, optionally accompanied with a projection onto a lower-dimensional space. It finds axes of maximal variance (which happen to be the principal axes of inertia of your point cloud) and then rotates the dataset to align those axes with your coordinate's system. You get to decide how many such axes you'd like to retain, which means the rotation is then followed by projection onto the first k axes of greatest variance, with k the dimensionality of the representation space you'll have chosen.
With this in mind, again like for calculating axes of inertia, you could decide to look for such axes through the center of mass of your cloud (the mean), or through any arbitrary origin of choice. In the former case, you would mean-center your data, and in the latter you may translate the data to any arbitrary point, with the result being to diminish the importance of the intrinsic cloud shape itself and increase the importance of the distance between the center of mass and the arbitrary point. Thus, in practice, you would almost always center your data.
You may also want to standardize your data (center and divide by standard deviation so as to make variance 1 on each coordinate), or even whiten your data.
In any case, you will want to apply the same transformations to the entire dataset, not class by class. If you were to apply the transformation class by class, whatever distance exists between the centers of gravity of each would be reduced to 0, and you would likely observe a collapsed representation with the two classes as overlapping. This may be interesting if you want to observe the intrinsic shape of each class, but then you would also apply PCA separately for each class.
Please note that PCA may make it easier for you to visualize the two classes (without guarantees, if the data are truly n-dimensional without much of a lower-dimensional embedding). But in no circumstances would it make it easier to discriminate between the two. If anything, PCA will reduce how discriminable your classes are, and it is often the case that the projection will intermingle classes (increase ambiguity) that are otherwise quite distinct and e.g. separable with a simple hyper-surface.
PCA is more or less per definition a SVD with centering of the data.
Depending on the implementation (if you use a PCA from a library) the centering is applied automatically e.g. sklearn - because as said it has to be centered by definition.
So for sklearn you do not need this preprocessing step and in general you apply it over your whole data.
PCA is unsupervised can be used to find a representation that is more meaningful and representative for you classes afterwards. So you need all your samples in the same feature space via the same PCA.
In short: You do the PCA once and over your whole (training) data and must be center over your whole (traning) data. Libraries like sklarn do the centering automatically.
The k neareast neighbor will help you distinquish between the two classes. Also try tsne to visualize data classes using higher dimensions.
def pca_classifier(X, y, n_components=2, n_neighbors=1):
"""
X: numpy array of shape (n_samples, n_features)
y: numpy array of shape (n_samples, )
n_components: int, number of components to keep
n_neighbors: int, number of neighbors to use in the knn classifier
"""
# 1. PCA
pca = PCA(n_components=n_components)
X_pca = pca.fit_transform(X)
# 2. KNN
knn = KNeighborsClassifier(n_neighbors=n_neighbors)
knn.fit(X_pca, y)
# 3. plot
plt.figure(figsize=(8, 6))
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, cmap=plt.cm.Set1, edgecolor='k')
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.title('PCA')
plt.show()
return knn
I've trained a decision tree on a dataset (handwritten) which contains 8 x-y points sampled along the length of the number (number digit dataset). The test dataset given (assignment), is the MNIST dataset, which is the pixel intensities in a 28x28 bitmap image. I need to sample 8 points and along the trajectory of the number so that it performs well.
I'm doing this in Python. I don't know what to do with the image to sample those points. Any package/procedure will help.
Simply index the array as you would any other array. The pixel intensities are merely ints. e.g val = arr[3,9].
You do not have the stroke direction in mnist.
Hence, you cannot reliably infer such positions.
You can do the opposite though: render the stroke information as pixel image, train a classifier on that, and then test it with mnist.
There's an MNIST sequence dataset from Edwin de Jong:
Paper: https://arxiv.org/pdf/1611.03068.pdf
Github: https://edwin-de-jong.github.io/blog/mnist-sequence-data/
Blog: https://github.com/edwin-de-jong/mnist-digits-as-stroke-sequences/
and an MNIST classification using RNN by Ryan Epp:
https://www.ryanepp.com/blog/mnist-classification-using-stroke-paths
In both projects, the direction strokes will take at a T-junction depends on the algorithm and is often counterintuitive. This means that there's more to learn for sequences since many stroke patterns will produce the same image.
I am working on an anomaly detection project on a call detail record for a telephone operator, I have prepared a sample of 10000 observations and 80 dimensions which represent the totality of the observations for a day of traffic, the data are represented as follows:
this is a small part of the whole dataset.
however, I decided to use the library PYOD which is an API that offers many unsupervised learning algorithms, I decided to start with CNN:
from pyod.models.knn import KNN
knn= KNN(contamination= 0.1)
result = knn.fit_predict(conso)
Then to visualize the result I decided to resize the sample in 2 dimentions and to display it in scatter with in blue the observations that KNN predicted that were not outliers and in red those which are outliers.
from sklearn.manifold import TSNE
result_f = TSNE(n_components = 2).fit_transform(df_final_2)
result_f = pd.DataFrame(result_f)
color= ['red' if row == 1 else 'blue' for row in result_list]
'df_final_2' is the dataframe version of 'conso'.
then I put all that in the right colors:
import matplotlib.pyplot as plt
plt.scatter(result_f[0],result_f[1], s=1, c=color)
The thing that disturbs me in the graph is that the observations predict as outliers are not really outliers because normally the outliers are in the extremity of the graph and not grouped with the normal behaviors, even by analyzing these obseravations aberent they have a normal behavior in the original dataset, I have tried other PYOD algorithms and I have modified the parameters of each algorithm but I have obtained at least the same result. I made a mistake somewhere and I can not distinguish it.
Thnx.
There are several things to check:
using knn, lof, and similar models that rely on distance measures, the data should be first standardized (using sklearn StandardScaler)
tsne may now work in this case and the dimensionality reduction could be off
maybe do not use fit_predict, but do this (use y_train_pred):
# train kNN detector
clf_name = 'KNN'
clf = KNN(contamination=0.1)
clf.fit(X)
# get the prediction labels and outlier scores of the training data
y_train_pred = clf.labels_ # binary labels (0: inliers, 1: outliers)
y_train_scores = clf.decision_scores_ # raw outlier scores
If none of these work, feel free to open an issue report on GitHub and we will take a further investigation.
I'm implementing a CNN with Theano. In the paper, I have to do this image preprocess before train the CNN
We extracted RGB patches of 61x61 dimensions associated with each poselet activation, subtracted the mean and used this data to train the convnet model shown in Table 1
Can you tell me what does it mean with "subtracted the mean"? Tell me if these steps are correct (it is what I understood)
1) Compute the mean for Red Channel, Green Channel and Blue Channel for the whole image
2) For each pixel, subtract from red value the mean of red channel, from green value the mean of green channel and the same for the blue channel
3) Is it correct to have negative value or do I have use the abs?
Thanks all!!
You should read paper carefully, but what is the most probable is that they mean mean of the patches, so you have N matrices 61x61 pixels, which is equivalent of a vector of length 61^2 (if there are three channels then 3*61^2). What they do - they simple compute mean of each dimension, so they calculate mean over these N vectors in respect to each of the 3*61^2 dimensions. As the result they obtain a mean vector of length 3*61^2 (or mean matrix/mean patch if you prefer) and they substract it from all of these N patches. Resulting patches will have negatives values, it is perfectly fine, you should not take abs value, neural networks prefer this kind of data.
I would assume the mean mentioned in the paper is the mean over all images used in the training set (computed separately for each channel).
Several indications:
Caffe is a lib for ConvNets. In their tutorial they mention the compute image mean part: http://caffe.berkeleyvision.org/gathered/examples/imagenet.html
For this they use the following script: https://github.com/BVLC/caffe/blob/master/examples/imagenet/make_imagenet_mean.sh
which does what I indicated.
Google played around with ConvNets and published their code here: https://github.com/google/deepdream/blob/master/dream.ipynb and they do also use the mean of the training set.
This is of course only indirect evidence since I can not explain you why this happens. In fact I stumbled over this question while trying to figure out precisely that.
//EDIT:
In the mean time I found a source confirming my claim (Highlighting added by me):
There are three common forms of data preprocessing a data matrix X [...]
Mean subtraction is the most common form of preprocessing. It
involves subtracting the mean across every individual feature in the
data, and has the geometric interpretation of centering the cloud of
data around the origin along every dimension. In numpy, this operation
would be implemented as: X -= np.mean(X, axis = 0). With images
specifically, for convenience it can be common to subtract a single
value from all pixels (e.g. X -= np.mean(X)), or to do so separately
across the three color channels.
As we can see, the whole data is used to compute the mean.
I want to train a SVM for object detection. At this point I have a python script which detects FAST keypoints and extracts BRIEF features at that location.
Now I don't know how to use these descriptors to train a SVM.
Would you tell me please:
How to use the descriptors to train the SVM (As far as I know these descriptors should be my train data)?
What are labels used for and how I can get them?
To train a SVM you would need a matrix X with your features and a vector y with your labels. It should look like this for 3 images and two features:
>>> from sklearn import svm
>>> X = [[0, 0], <- negative 0
[1, 3], <- positive 1
2, 5]] <- negative 0
>>> y = [0,
1,
0]
>>> model = svm.SVC()
>>> model.fit(X, y)
The training set would consist of several images, each image would be a row of X and y.
Labels:
For the labels y you need positive and negative examples (0 or 1):
Positive Samples
You can specify positive samples in two ways. One way is to specify
rectangular regions in a larger image. The regions contain the objects
of interest. The other approach is to crop out the object of interest
from the image and save it as a separate image. Then, you can specify
the region to be the entire image. You can also generate more positive
samples from existing ones by adding rotation or noise, or by varying
brightness or contrast.
Negative Samples
Images that do not contain objects of interest.
[slightly edited from here]
Feature matrix X:
Here you can get creative but I will mention a simple idea. Make height * width features, one for each pixel of each image, but make them all 0 except in a small region around the FAST keypoints. In the end your X matrix will have dimension (n_images, height*width).
Another commonly used idea is Bag of Words. The X matrix must have a fixed number of features/columns and the number of keypoints is variable. This is a representation problem but it can be solved binning them in a histogram with a fixed number of bins. For details see for example this paper.
You will have to consult the specialized literature to come up with more ways to incorporate the BRIEF features but I hope this will give you an idea on how to get started.