Don't show zero values on 2D heat map - python

I want to plot a 2D map of a sillicon wafer dies. Hence only the center portion have values and corners have the value 0. I'm using matplotlib's plt.imshow to obtain a simple map as follows:
data = np.array([[ 0. , 0. , 1. , 1. , 0. , 0. ],
[ 0. , 1. , 1. , 1. , 1. , 0. ],
[ 1. , 2. , 0.1, 2. , 2. , 1. ],
[ 1. , 2. , 2. , 0.1, 2. , 1. ],
[ 0. , 1. , 1. , 1. , 1. , 0. ],
[ 0. , 0. , 1. , 1. , 0. , 0. ]])
plt.figure(1)
plt.imshow(data ,interpolation='none')
plt.colorbar()
And I obtain the following map:
Is there any way to remove the dark blue areas where the values are zeros while retaining the shape of the 'wafer' (the green, red and lighter blue areas)? Meaning the corners would be whitespaces while the remainder retains the color configuration.
Or is there a better function I could use to obtain this?

There are two ways to get rid of the dark blue corners:
You can flag the data with zero values:
data[data == 0] = np.nan
plt.imshow(data, interpolation = 'none', vmin = 0)
Or you can create a masked array for imshow:
data_masked = np.ma.masked_where(data == 0, data)
plt.imshow(data_masked, interpolation = 'none', vmin = 0)
The two solutions above both solve your problem, although the use of masks is a bit more general.
If you want to retain the exact color configuration you need to manually set the vmin/vmax arguments for plotting the image. Passing vmin = 0 to plt.imshow above makes sure that the discarded zeros still show up on the color bar.

Related

Python kernel crash in Jupyter Notebook when calling TSNE.fit_transform()

I have an output of sklearn's tf-idf which I want to visualize with T-SNE. However, when calling fit_transform on sklearn's T-SNE object, I get the error message:
"Canceled future for execute_request message before replies were done
The Kernel crashed while executing code in the the current cell or a
previous cell. Please review the code in the cell(s) to identify a
possible cause of the failure. Click here for more info. View Jupyter
log for further details."
Why is this happening? Code below.
dense = np.array(
[[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 1. , 0. ],
[0. , 0. , 0. , 1. , 0. ],
[0.70710678, 0.70710678, 0. , 0. , 0. ],
[0. , 0. , 0.70710678, 0. , 0.70710678],
[0.70710678, 0.70710678, 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0.70710678, 0. , 0.70710678]])
from sklearn.manifold import TSNE
tsne = TSNE(n_components = 2, verbose = 1, perplexity = 50, n_iter = 1000)
results = tsne.fit_transform(dense)
I wasn't able to reproduce the error in a Google Colab. It works fine on my end, with the following output:
[t-SNE] Computing 10 nearest neighbors...
[t-SNE] Indexed 11 samples in 0.000s...
[t-SNE] Computed neighbors for 11 samples in 0.009s...
[t-SNE] Computed conditional probabilities for sample 11 / 11
[t-SNE] Mean sigma: 1125899906842624.000000
[t-SNE] KL divergence after 250 iterations with early exaggeration: 39.474655
[t-SNE] KL divergence after 1000 iterations: 0.268328
I've found an old thread on GitHub that may address the problem. It is a Mac related issue, but I don't know what OS does your machine has.
There is a chance that they fixed the error in newer versions of sklearn, so my first suggestion is to try upgrading, if you haven't already.
If the issue still persists, since the problem may be due to a dependency that sklearn uses (and even if you do not have a Mac, you still have a problem), I would recommend using a different library. I know about python-bhtsne that can be used in a similar way as sklearn's.

Slice mesh with trimesh

I work on a big .stl file which I want to cut into pieces using a bounding box.
For this purpose, I use trimesh python package to load the .stl.
Here is the piece of code used to generate the bounding box :
box = trimesh.creation.box(extents=[1.5, 1.5, 1.5])
print(box.facets_origin)
print(box.facets_normal)
So I get as a return :
print(box.facets_origin)
[[-0.75 -0.75 0.75]
[ 0.75 -0.75 -0.75]
[-0.75 0.75 -0.75]
[-0.75 -0.75 0.75]
[-0.75 0.75 0.75]
[ 0.75 0.75 -0.75]]
print(box.facets_normal)
[[-1. 0. 0.]
[ 0. -1. 0.]
[ 0. 0. -1.]
[ 0. 0. 1.]
[ 0. 1. 0.]
[ 1. 0. 0.]]
This means that the box's center of gravity is at (0, 0, 0)
And then I plan to cut the big stl using slice_plane function.
However, I would like to change the location of the bounding box's center of mass, or facets' location.
How this could be done using trimesh ? Or another Python package ?
Thanks in advance for your help !
Joachim
Can you not translate the box using
mesh.apply_transform(trimesh.transformations.scale_and_translate())
https://github.com/mikedh/trimesh/blob/master/trimesh/transformations.py

Question on discrete convolution with python

I am struggling to understand why the np.convolve method returns an N+M-1 set. I would appreciate your help.
Suppose I have two discrete probability distributions with values of [1,2] and [10,12] and probabilities of [.5,0.2] and [.5,0.4] respectively.
Using numpy's convolve function I get:
>>In[]: np.convolve([.5,0.2],[.5,0.4])
>>Out[]: array([[0.25, 0.3 , 0.08])
However I don't understand why the resulting probability distribution only has 3 datapoints. To my understanding the sum of my input variables can have the following values: [11,12,13,14] so I would expect 4 datapoints to reflect the probabilities of each of these occurrences.
What am I missing?
I have managed to find the answer to my own question after understanding convolution a bit better. Posting it here for anyone wondering:
Effectively, the convolution of the two "signals" or probability functions in my example above is not correctly done as it is nowhere reflected that the events [1,2] of the first distribution and [10,12] of the second do not coincide.
Simply taking np.convolve([.5,0.2],[.5,0.4]) assumes the probabilities corresponding to the same events (e.g. [1,2] [1,2]).
Correct approach would be to bring the two series into alignment under a common X axis as in x \in [1,12] as below:
>>In[]: vector1 = [.5,0.2, 0,0,0,0,0,0,0,0,0,0]
>>In[]: vector2 = [0,0,0,0,0,0,0,0,0,.5, 0,0.4]
>>In[]: np.convolve(vector1, vector2)
>>Out[]: array([0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.25, 0.1 ,
0.2 , 0.08, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. ])
which gives the correct values for 11,12,13,14

How to interpolate cumulative histogram data?

I have got a set of histograms from numpy.histogram:
probas, years = zip(*[np.histogram(r, bins= bin_values) for r in results])
results is an array of shape(9, 10000) The bin values are the years from 2029 and 2066. The probas array has a shape (9,37) and the years array (9,38). So years[:,:-1] has a shape of (9,37).
I can obtaint he cumulative histogram data using:
probas = np.cumsum(probas, axis=1)
I can then normalize it to [0,1]:
probas = np.asarray(probas)
probas = probas/np.max(probas, axis = 0)
I then try and interpolate that cumulative distribution using scipy:
inverse_pdfs = [scipy.interpolate.interp1d(probas[i], years[i,:-1]) for i in range(probas.shape[0])]
When I plot the third histogram of the data set as a plt.plot() and that from the inverse_pdfs using:
i = 2
plt.plot(years[i,:-1], probas[i], color="orange")
probability_range = np.arange(0.,1.01,0.01)
plt.plot([inverse_pdfs[i](p) for p in probability_range], probability_range, color="blue")
I obtain:
As you can see the match is pretty good for most of the years after 2042, but before that it is very bad.
Any suggestion on how to improve that match, or where the problem comes from, would be very welcome.
For information, the data used to train the interpolator on the third histogram are:
years[2,:-1]: [2029. 2030. 2031. 2032. 2033. 2034. 2035. 2036. 2037. 2038. 2039. 2040.
2041. 2042. 2043. 2044. 2045. 2046. 2047. 2048. 2049. 2050. 2051. 2052.
2053. 2054. 2055. 2056. 2057. 2058. 2059. 2060. 2061. 2062. 2063. 2064.
2065.]
probas[2]:[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0.0916 0.2968 0.4888 0.6666 0.8335 0.9683 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. ]

Sklearn digits dataset

import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn import svm
digits = datasets.load_digits()
print(digits.data)
classifier = svm.SVC(gamma=0.4, C=100)
x, y = digits.data[:-1], digits.target[:-1]
x = x.reshape(1,-1)
y = y.reshape(-1,1)
print((x))
classifier.fit(x, y)
###
print('Prediction:', classifier.predict(digits.data[-3]))
###
plt.imshow(digits.images[-1], cmap=plt.cm.gray_r, interpolation='nearest')
plt.show()
I have reshaped the x and y as well. Still I'm getting an error saying :
Found input variables with inconsistent numbers of samples: [1, 1796]
Y has 1-d array with 1796 elements whereas x has many. How does it show 1 for x?
Actually scrap what I suggested below:
This link describes the general dataset API. The attribute data is a 2d array of each image, already flattened:
import sklearn.datasets
digits = sklearn.datasets.load_digits()
digits.data.shape
#: (1797, 64)
This is all you need to provide, no reshaping required. Similarly, the attribute data is a 1d array of each label:
digits.data.shape
#: (1797,)
No reshaping necessary. Just split into training and testing and run with it.
Try printing x.shape and y.shape. I feel that you're going to find something like: (1, 1796, ...) and (1796, ...) respectively. When calling fit for classifiers in scikit it expects two identically shaped iterables.
The clue, why are the arguments when reshaping different ways around:
x = x.reshape(1, -1)
y = y.reshape(-1, 1)
Maybe try:
x = x.reshape(-1, 1)
Completely unrelated to your question, but you're predicting on digits.data[-3] when the only element left out of the training set is digits.data[-1]. Not sure if that was intentional.
Regardless, it could be good to check your classifier over more results using the scikit metrics package. This page has an example of using it over the digits dataset.
The reshaping will transform your 8x8 matrix to a 1-dimensional vector, which can be used as a feature. You need to reshape the entire X vector, not only those of the training data, since the one's you will use for prediction need to have the same format.
The following code shows how:
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn import svm
digits = datasets.load_digits()
classifier = svm.SVC(gamma=0.4, C=100)
x, y = digits.images, digits.target
#only reshape X since its a 8x8 matrix and needs to be flattened
n_samples = len(digits.images)
x = x.reshape((n_samples, -1))
print("before reshape:" + str(digits.images[0]))
print("After reshape" + str(x[0]))
classifier.fit(x[:-2], y[:-2])
###
print('Prediction:', classifier.predict(x[-2]))
###
plt.imshow(digits.images[-2], cmap=plt.cm.gray_r, interpolation='nearest')
plt.show()
###
print('Prediction:', classifier.predict(x[-1]))
###
plt.imshow(digits.images[-1], cmap=plt.cm.gray_r, interpolation='nearest')
plt.show()
It will output:
before reshape:[[ 0. 0. 5. 13. 9. 1. 0. 0.]
[ 0. 0. 13. 15. 10. 15. 5. 0.]
[ 0. 3. 15. 2. 0. 11. 8. 0.]
[ 0. 4. 12. 0. 0. 8. 8. 0.]
[ 0. 5. 8. 0. 0. 9. 8. 0.]
[ 0. 4. 11. 0. 1. 12. 7. 0.]
[ 0. 2. 14. 5. 10. 12. 0. 0.]
[ 0. 0. 6. 13. 10. 0. 0. 0.]]
After reshape[ 0. 0. 5. 13. 9. 1. 0. 0. 0. 0. 13. 15. 10. 15. 5.
0. 0. 3. 15. 2. 0. 11. 8. 0. 0. 4. 12. 0. 0. 8.
8. 0. 0. 5. 8. 0. 0. 9. 8. 0. 0. 4. 11. 0. 1.
12. 7. 0. 0. 2. 14. 5. 10. 12. 0. 0. 0. 0. 6. 13.
10. 0. 0. 0.]
And a correct prediction for the last 2 images, which weren't used for training - you can decide however to make a bigger split between testing and training set.

Categories