Python: Plot a sparse matrix - python

I have a sparse matrix X, shape (6000, 300). I'd like something like a scatterplot which has a dot where the X(i, j) != 0, and blank space otherwise. I don't know how many nonzero entries there are in each row of X. X[0] has 15 nonzero entries, X[1] has 3, etc. The maximum number of nonzero entries in a row is 16.
Attempts:
plt.imshow(X) results in a tall, skinny graph because of the shape of X. Using plt.imshow(X, aspect='auto) will stretch out the graph horizontally, but the dots get stretched out to become ellipses, and the plot becomes hard to read.
ax.spy suffers from the same problem.
bokeh seems promising, but really taxes my jupyter kernel.
Bonus:
The nonzero entries of X are positive real numbers. If there was some way to reflect their magnitude, that would be great as well (e.g. colour intensity, transparency, or across a colour bar).
Every 500 rows of X belong to the same class. That's 12 classes * 500 observations (rows) per class = 6000 rows. E.g. X[:500] are from class A, X[500:1000] are from class B, etc. Would be nice to colour-code the dots by class. For the moment I'll settle for manually including horizontal lines every 500 rows to delineate between classes.

You can use nonzero() to find the non zero elements and use scatter() plot the points:
import pylab as pl
import numpy as np
a = np.random.rand(6000, 300)
a[a < 0.9999] = 0
r, c = np.nonzero(a)
pl.scatter(r, c, c=a[r, c])

It seems to me heatmap is the best candidate for this type of plot. imshow() will return u a colored matrix with color scale legend.
I don't get ur stretched ellipses problem, shouldnt it be a colored squred for each data point?
u can try log color scale if it is sparse. also plot the 12 classes separately to analyze if theres any inter-class differences.

plt.matshow also turned out to be a feasible solution. I could also plot a heatmap with colorbars and all that.

Related

Matrix normalization over multiple runs, what does this code do?

I have several numpy matrices collected over some time. I now want to visualize these matrices and explore visual similarities among them. The matrices contain small numbers from 0.0 to 1.0.
To compare them, I want to ensure that the same "areas" get colored with the same color, e.g. that 0.01 to 0.02 always is red, and 0.02 to 0.03 always is green. I have two question:
I found another question which has this code snippet:
a = np.random.normal(0.0,0.5,size=(5000,10))**2
a = a/np.sum(a,axis=1)[:,None] # Normalize
plt.pcolor(a)
What is the effect of the second line, precisely the [:,None] statement. I tried normalizing a matrix by:
max_a = a/10# Normalize
print(max_a.shape)
plt.pcolor(max_a)
but there is not much visual difference compared to the visualization for the unnormalized matrix. When I then add the [:,None] statement I get an error
ValueError: too many values to unpack (expected 2)
which is expected since the shape now is 10,1,10. I therefor want to know what the brackets do and how to read the statement.
Secondly, and related, I want to make sure that I can visual compare the matrices. I therefor want to fix the "colorization", e.g. the ranges when a color is green or red, so that I do not end up with 0 to 0.1 as green in plot A and with 0 to 0.1 as red in plot B. How can I fix the "translation" from floats to colors? Do I have to normalize each matrix with a same constant, e.g. 10? Or do I normalize them with an unique value -- do I even need normalization here?
[:,None] adds new axis so you'll be able to divide sum of all columns in each row - it is the same as using np.sum(a,axis=1)[:,np.newaxis] - when you sum all columns with np.sum(a,axis=1) you'll get 1d array with shape (5000), but to be able to normalize your matrix with summed columns you need 2d array with shape (5000,1), that's why new axis is needed.
You can have fixed colors by fixing scale of your colormap: plt.pcolor(max_a,vmin=0,vmax=1)
adding discrete colorbar might also help:
from pylab import cm
cmap = cm.get_cmap('jet', 10)
plt.pcolor(a,cmap=cmap,vmin=0,vmax=1)
plt.colorbar()

Using python to plot a heat map from five arrays: x,y and 3 arrays indicating RGB

I have 2 arrays, x and y, respectively representing each point's coordinate on a 2D plane. I also have another 3 arrays of the same length as x and y. These three arrays represent the RGB values of a color. Therefore, each point in x,y correspond to a color indicated by the RGB arrays. In Python, how can I plot a heat map with x,y as its axes and colors from the three RGB arrays? Each array is, say, 1000 in length.
As an example that takes the first 10 points, I have:
x = [10.946028, 16.229064, -36.855, -38.719057, 11.231684, 33.256904999999996, -41.21, 12.294958, 16.113228, -43.429027000000005]
y = [-21.003803, 4.5, 4.5, -22.135853, 4.084630000000001, 17.860079000000002, -18.083685, -3.98297, -19.565272, 0.877016]
R = [0,1,2,3,4,5,6,7,8,9]
G = [2,4,6,8,10,12,14,16,18,20]
B = [0,255,0,255,0,255,0,255,0,255]
I'd like to draw a heat map that, for example, the first point would have the coordinates (10.946028,-21.003803) and has a color of R=0,G=2,B=0. The second point would have the coordinates (16.229064, 4.5) and has a color of R=1,G=4,B=255.
Ok it seems like you want like your own colormap for your heatmap. Actually you can write your own, or just use some of matplotlibs templates. Check out this post for the use of heatmaps with matplotlib. If you want to do it on your own, the easiest way is to recombine the 5 one-dimension vectors to a 3D-RGB image. Afterwards you have to define a mapping function which combines the R-G and B value to a new single value for every pixel. Like:
f(R,G,B) = a*R +b*G + c*B
a,b,c can be whatever you like, actually the formular can be way more complex, but you have to determine in which correlation the values should be. From that you get a 2D-Matrix filled with values of your function f(R,G,B). Now you have to define which value of this new matrix gets what color. This can be a linear mapping by hand (like just writing a list: 0=deep-Blue , 1= ligth-Red ...). Using this look-up table you can now get your own specific heatmap. But as you may see, that path takes some time so i would recommend not doing it and just use one of the various templates of matplotlib. Example:
import matplotlib.pyplot as plt
import numpy as np
a = np.random.random((16, 16))
plt.imshow(a, cmap='hot', interpolation='nearest')
plt.show()
You can use various types of these buy changing the string after cmap="hot" to sth of that list. Hope i could help you, gl hf.

Interpolating to get rid of NANs and contour plot

I have these arrays that I need to interpolate and make the smoothest possible interpolation:
x = time
y = height
z = latitude
print np.shape(x)
print np.shape(y)
print np.shape(z)
Result:
(99, 25)
(99, 25)
(99, 25)
y is altitude and it's not uniform. It has a bunch of nan's and even though they're all the same size (a variable n_alt with the number of altitudes, which is for this example 99).
x is time and it's uniform all the way through (all the values in one column of that array are the same).
z is latitude and it's the actual 'z' and it's an array with the same number of rows as the number of time points and same number of rows as the altitude points.
I want to interpolate in 2D (the data set has series of nans in both x and y directions) to fill the gaps on the data, since several files will cover a certain altitude range and not others
My questions are:
1) is there a good way to fill the gaps the 2 directions while making the grid uniform (the idea is to plot that and also save the interpolated data (x,y and z) into a new file as well)?
2) what's a good way to contour plot the data with the shape I mentioned earlier (tried plt.contour, but it doesn't give a satisfactory result just plotting that straight up)?
Thanks y'all
Edit:
I believe this will illustrate the question better:
X: Time, Y: Altitude, Z: Latitude or Longitude
I essentially want to fill up the white space (I understand the consequences of extrapolations and all, but I just want, at this point, to have an algorithm that works. The blue dots is my grid and the color plot is just a normal plt.contour (no interpolation done). I want to make such that I have blue dots all over the plot area.
Rafael! With respect to your interpolation question, I can explain the math if you want to manually come up with an interpolation function, but there is an existing resource you might want to look into: scipy.interpolate.RegularGridInterpolator
(see https://docs.scipy.org/doc/scipy-0.16.1/reference/generated/scipy.interpolate.RegularGridInterpolator.html)
If I have misunderstood your issue, another interpolation method from the class might be appropriate: see, scipy.interpolate
For plotting the 3d surface, https://matplotlib.org/examples/mplot3d/surface3d_demo.html might help guide you! Let me know if this helps! Just comment if you would like me to expand! Hopefully those are the resources you were looking for!

Python: How to plot a heatmap for coordinates with different color intensity or different radius of circles?

Given some data in three lists, for example:
latitudes = [50.877979278564,48.550216674805,47.606079101562,50.772491455078,42.451354980469,43.074657440186,44.044174194336,44.563243865967,52.523406982422,50.772491455078]
longitudes = [4.700091838837, 9.038957595825, -122.333000183105, 7.190686225891, -76.476554870605, -89.403335571289, -123.070274353027, -123.281730651855, 13.411399841309, 7.190686225891]
counts = [15, 845, 2, 50, 95, 49, 67, 32, 1, 88]
which can be interpreted as: The coordinate of i which is (latitudes[i], longitudes[i]) occures counts[i] times on the map.
I want to generate a heatmap with an appropriate scale. The cordinates should be represented by colour filled circles. The diameter of the circles should somehow represent the count of the corresponding coordinate.
(As an alternative I thought about representing the count by colour intensity. I don't know which is best or if these two represantations can be combined.)
How can do I realize such a heatmap? (I assume it is called so?)
Perhaps it is relevant to mention the amount of data I am dealing with:
sum(counts) is about 1.000.000
there are around 25.000 different coordinates.
scatter is the method you are looking for, at it has two optional parameters to either adjust the size (with keyword size or just s) or the color (with keyword color or c) of each point, or you can do both simultaneously. The color, or heatmap effect, is probably better for the density of points you have.
Here's an example of using this method:
import matplotlib.pyplot as plt
import numpy as np
NPOINTS = 1000
np.random.seed(101)
lat = np.random.random(NPOINTS)*8+44
lon = np.random.random(NPOINTS)*100-50
counts = np.random.randint(0,1000,NPOINTS)
plt.subplot(211)
plt.scatter(lat, lon, c=counts)
plt.colorbar()
plt.subplot(212)
plt.scatter(lat, lon, s=counts)
plt.savefig('scatter_example.png')
plt.show()
Resulting in:
If you choose to use size, you might want to adjust the count values to get a less crowded plot, for example by extending the above example with:
plt.figure()
COUNT_TO_SIZE = 1./10
plt.scatter(lat, lon, s=counts*COUNT_TO_SIZE)
plt.savefig('scatter_example2.png')
You get a cleaner plot:
I've of course accidentally swapped latitude and longitude from their normal axes, but you get the idea :)
I am not so sure on the heat map, but to plot with coloured circles of different sizes you can use:
from matplotlib import pyplot
pyplot.scatter(longitudes,latitudes,counts,c=rgb)
pyplot.show()
where rgb is a 2-d array of user defined rgb values, something like:
maxcount = float(max(counts))
rgb = [[ 1, 0.5, x/maxcount ] for x in counts]
or however you wish to define your colours.
In a general answer for any graphics library, you would want to do something like this:
maxSize = 10 #The maximum radius of the circles you wish to draw.
maxCount = max(counts)
for lat, long, count in zip(latitudes, longitudes, counts):
draw_circle(lat, long, count/maxCount*maxSize) #Some drawing library, taking x, y, radius.
zip() allows you to join your three lists and iterate over them in one loop.
Dividing the count by the maximum count gives you a relative scale in size, which you then multiply up by the size you want the circles to be. If you wanted to change the colour too, you could do something like:
maxSize = 10 #The maximum radius of the circles you wish to draw.
maxCount = max(counts)
for lat, long, count in zip(latitudes, longitudes, counts):
intensity = count/maxCount
draw_circle(lat, long, intensity*maxSize, Color(intensity*255, 0, 0)) #Some drawing library, taking x, y, radius, colour.
Producing a sliding scale from black to red as intensity increases.
You may need to adjust the latitude and longitude values to produce sane x and y values, depending on the size you want in your final image and the values you are going to put in. If you find your counts get too large to display, and the smaller items too small when lowering the max size, you might want to consider a logarithmic scale instead of linear for the intensity.
Implementing this with an actual graphics library should be trivial, but depends on the library itself.

How to plot an image with non-linear y-axis with Matplotlib using imshow?

How can I plot an 2D array as an image with Matplotlib having the y scale relative to the power of two of the y value?
For instance the first row of my array will have a height in the image of 1, the second row will have a height of 4, etc. (units are irrelevant)
It's not simple to explain with words so look at this image please (that's the kind of result I want):
alt text http://support.sas.com/rnd/app/da/new/802ce/iml/chap1/images/wavex1k.gif
As you can see the first row is 2 times smaller that the upper one, and so on.
For those interested in why I am trying to do this:
I have a pretty big array (10, 700000) of floats, representing the discrete wavelet transform coefficients of a sound file. I am trying to plot the scalogram using those coefficients.
I could copy the array x times until I get the desired image row size but the memory cannot hold so much information...
Have you tried to transform the axis? For example:
ax = subplot(111)
ax.yaxis.set_ticks([0, 2, 4, 8])
imshow(data)
This means there must be gaps in the data for the non-existent coordinates, unless there is a way to provide a transform function instead of just lists (never tried).
Edit:
I admit it was just a lead, not a complete solution. Here is what I meant in more details.
Let's assume you have your data in an array, a. You can use a transform like this one:
class arr(object):
#staticmethod
def mylog2(x):
lx = 0
while x > 1:
x >>= 1
lx += 1
return lx
def __init__(self, array):
self.array = array
def __getitem__(self, index):
return self.array[arr.mylog2(index+1)]
def __len__(self):
return 1 << len(self.array)
Basically it will transform the first coordinate of an array or list with the mylog2 function (that you can transform as you wish - it's home-made as a simplification of log2). The advantage is, you can re-use that for another transform should you need it, and you can easily control it too.
Then map your array to this one, which doesn't make a copy but a local reference in the instance:
b = arr(a)
Now you can display it, for example:
ax = subplot(111)
ax.yaxis.set_ticks([16, 8, 4, 2, 1, 0])
axis([-0.5, 4.5, 31.5, 0.5])
imshow(b, interpolation="nearest")
Here is a sample (with an array containing random values):
alt text http://img691.imageshack.us/img691/8883/clipboard01f.png
The best way I've found to make a scalogram using matplotlib is to use imshow, similar to the implementation of specgram. Using rectangles is slow, because you're having to make a separate glyph for each value. Similarly, you don't want to have to bake things into a uniform NumPy array, because you'll probably run out of memory fast, since your highest level is going to be about as long as half your signal.
Here's an example using SciPy and PyWavelets:
from pylab import *
import pywt
import scipy.io.wavfile as wavfile
# Find the highest power of two less than or equal to the input.
def lepow2(x):
return 2 ** floor(log2(x))
# Make a scalogram given an MRA tree.
def scalogram(data):
bottom = 0
vmin = min(map(lambda x: min(abs(x)), data))
vmax = max(map(lambda x: max(abs(x)), data))
gca().set_autoscale_on(False)
for row in range(0, len(data)):
scale = 2.0 ** (row - len(data))
imshow(
array([abs(data[row])]),
interpolation = 'nearest',
vmin = vmin,
vmax = vmax,
extent = [0, 1, bottom, bottom + scale])
bottom += scale
# Load the signal, take the first channel, limit length to a power of 2 for simplicity.
rate, signal = wavfile.read('kitten.wav')
signal = signal[0:lepow2(len(signal)),0]
tree = pywt.wavedec(signal, 'db5')
# Plotting.
gray()
scalogram(tree)
show()
You may also want to scale values adaptively per-level.
This works pretty well for me. The only problem I have is that matplotlib creates a hairline-thin space between levels. I'm still looking for a way to fix this.
P.S. - Even though this question is pretty old now, I figured I'd respond here, because this page came up on Google when I was looking for a method of creating scalograms using MPL.
You can look at matplotlib.image.NonUniformImage. But that only assists with having nonuniform axis - I don't think you're going to be able to plot adaptively like you want to (I think each point in the image is always going to have the same area - so you are going to have to have the wider rows multiple times). Is there any reason you need to plot the full array? Obviously the full detail isn't going to show up in any plot - so I would suggest heavily downsampling the original matrix so you can copy rows as required to get the image without running out of memory.
If you want both to be able to zoom and save memory, you could do the drawing "by hand". Matplotlib allows you to draw rectangles (they would be your "rectangular pixels"):
from matplotlib import patches
axes = subplot(111)
axes.add_patch(patches.Rectangle((0.2, 0.2), 0.5, 0.5))
Note that the extents of the axes are not set by add_patch(), but you can set them yourself to the values you want (axes.set_xlim,…).
PS: I looks to me like thrope's response (matplotlib.image.NonUniformImage) can actually do what you want, in a simpler way that the "manual" method described here!

Categories