Python Matplotlib: plotting histogram with overlapping boundaries removed

Python Matplotlib: plotting histogram with overlapping boundaries removed - python

I am plotting a histogram using Matplotlib in Python with the matplotlib.bar() function. This gives me plots that look like this:
I am trying to produce a histogram that only plots the caps of each bar and the sides that don't directly share space with the border of another bar, more like this: (I edited this using gimp)
How can I achieve this using Python? Answers using matplotlib are preferable since that is what I have the most experience with but I am open to anything that works using Python.
For what it's worth, here's the relevant code:
import numpy as np
import matplotlib.pyplot as pp
bin_edges, bin_values = np.loadtxt("datafile.dat",unpack=True)
bin_edges = np.append(bin_edges,500.0)
bin_widths = []
for j in range(len(bin_values)):
bin_widths.append(bin_edges[j+1] - bin_edges[j])
pp.bar(bin_edges[:-1],bin_values,width=bin_widths,color="none",edgecolor='black',lw=2)
pp.savefig("name.pdf")

I guess the easiest way is to use the step function instead of bar:
http://matplotlib.org/examples/pylab_examples/step_demo.html
Example:
import numpy as np
import matplotlib.pyplot as pp
# Simulate data
bin_edges = np.arange(100)
bin_values = np.exp(-np.arange(100)/5.0)
# Prepare figure output
pp.figure(figsize=(7,7),edgecolor='k',facecolor='w')
pp.step(bin_edges,bin_values, where='post',color='k',lw=2)
pp.tight_layout(pad=0.25)
pp.show()
If your bin_edges given represent the left edge use where='post'; if they are the rightern side use where='pre'. The only issue I see is that step doesn't really plot the last (first) bin correctly if you use post (pre). But you could just add another 0 bin before/after your data to make it draw everything properly.
Example 2 - If you want to bin some data and draw a histogram you could do something like this:
# Simulate data
data = np.random.rand(1000)
# Prepare histogram
nBins = 100
rng = [0,1]
n,bins = np.histogram(data,nBins,rng)
x = bins[:-1] + 0.5*np.diff(bins)
# Prepare figure output
pp.figure(figsize=(7,7),edgecolor='k',facecolor='w')
pp.step(x,n,where='mid',color='k',lw=2)
pp.show()

Related

Radial heatmap from similarity matrix in Python

Summary
I have a 2880x2880 similarity matrix (8.5 mil points). My attempt with Holoviews resulted in a 500 MB HTML file which never finishes "opening". So how do I make a round heatmap of the matrix?
Details
I had data from 10 different places, measured over 1 whole year. The hours of each month were turned into arrays, so each month had 24 arrays (one for all 00:00, one for all 01:00 ... 22:00, 23:00).
These were about 28-31 cells long, and each cell had the measurement of the thing I'm trying to analyze. So there are these 24 arrays for each month of 1 whole year, i.e. 24x12 = 288 arrays per place. And there are measurements from 10 places. So a total of 2880 arrays were created and all compared to each other, and saved in a 2880x2880 matrix with similarity coefficients.
I'm trying to turn it into a radial similarity matrix like the one from holoviews, but without the ticks and tags (since the format Place01Jan0800 would be cumbersome to look at for 2880 rows), just the shape and colors and divisions:
I managed to create the HTML file itself, but it ended up being 500 MB big, so it never shows up when I open it up. It's just blank. I've added a minimal example below of what I have, and replaced the loading of the datafile with some randomly generated data.
import sys
sys.setrecursionlimit(10000)
import random
import numpy as np
import pandas as pd
import holoviews as hv
from holoviews import opts
from bokeh.plotting import show
import gc
# Function creating dummy data for this example
def transformer():
dimension = 2880
dummy_matrix = ([[ random.random() for i in range(dimension) ] for j in range(dimension)]) #Fake, similar data
col_vals = [str(i) for i in range(dimension*dimension)] # Placeholder
row_vals = [str(i) for i in range(dimension*dimension)] # Placeholder
val_vals = (np.reshape(np.array(dummy_matrix), -1)).tolist() # Turn matrix into an array
idx_vals = [i for i in range(dimension*dimension)] # Placeholder
return idx_vals, val_vals, row_vals, col_vals
idx_arr, val_arr, row_arr, col_arr = transformer()
df = pd.DataFrame({"values": val_arr, "x-label": row_arr, "y-label": col_arr}, index=idx_arr)
hv.extension('bokeh')
heatmap = hv.HeatMap(df, ["x-label", "y-label"])
heatmap.opts(opts.HeatMap(cmap="viridis", radial=True))
gc.collect() # Attempt to save memory, because this thing is huge
show(hv.render(heatmap))
I had a look at datashader to see if it would help, but I have no idea how to plug it in (if it's possible for this case) to this radial heatmap, since it seems like the radial heatmap doesn't have that datashade-feature.
So I have no idea how to tackle this. I would be content with a broad overview too, I don't need the details nor the hover-infobox nor ability to zoom or any fancy extra features, I just need the general overview for a presentation. I'm open to any solution really.

I recommend you to use heatmp instead of radial heatamp for showing the similarity matrix. The reasons are:
The radial heatmap is designed for periodic variable. The time varible(288 hours) can be considered to be periodic data, however, I think the 288*10(288 hours, 10 places) is no longer periodic because of the existence of the "place".
Near the center of the radial heatmap, the color points will be too dense to be understood by the human.
The following is a simple code to show a heatmap.
import matplotlib.cm
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize
import numpy as np
n = 2880
m = 2880
dummy_matrix = np.random.rand(m, n)
fig = plt.figure(figsize=(50,50)) # change the figsize to control the resolution
ax = fig.add_subplot(111)
cmap = matplotlib.cm.get_cmap("Blues") # you may use other build-in colormap or define you own colormap
# if your data is not in range[0,1], use a normalization. Here is normalized by min and max values.
norm = Normalize(vmin=np.amin(dummy_matrix), vmax=np.amax(dummy_matrix))
image = ax.imshow(dummy_matrix, cmap=cmap, norm=norm)
plt.colorbar(image)
plt.show()
Which gives:
Another idea that comes to me is that, perhaps the computation of similarity matrix is unnecessary, and you can plot the orginial 288 * 10 data using radial heat map or just a normal heatmap, and one can get to know the data similarity from the color distribution directly.

Plain Matplotlib seems to be able to handle it, based on answers from here: How do I create radial heatmap in matplotlib?
import random
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
fig = plt.figure()
ax = Axes3D(fig)
n = 2880
m = 2880
rad = np.linspace(0, 10, m)
a = np.linspace(0, 2 * np.pi, n)
r, th = np.meshgrid(rad, a)
dummy_matrix = ([[ random.random() for i in range(n) ] for j in range(m)])
plt.subplot(projection="polar")
plt.pcolormesh(th, r, dummy_matrix, cmap = 'Blues')
plt.plot(a, r, ls='none', color = 'k')
plt.grid()
plt.colorbar()
plt.savefig("custom_radial_heatmap.png")
plt.show()
And it didn't even take an eternity, took only about 20 seconds max.
You would think it would turn out monstrous like that
But the sheer amount of points drowns out the jaggedness, WOOHOO!
There's some things left to be desired, like tags and ticks, but I think I'll figure that out.

Controlling Bin Widths in Altair

I have a set of numbers that I'd like to plot on a histogram.
Say:
import numpy as np
import matplotlib.pyplot as plt
my_numbers = np.random.normal(size = 1000)
plt.hist(my_numbers)
If I want to control the size and range of the bins I could do this:
plt.hist(my_numbers, bins=np.arange(-4,4.5,0.5))
Now, if I want to plot a histogram in Altair the code below will do, but how do I control the size and range of the bins in Altair?
import pandas as pd
import altair as alt
my_numbers_df = pd.DataFrame.from_dict({'Integers': my_numbers})
alt.Chart(my_numbers_df).mark_bar().encode(
alt.X("Integers", bin = True),
y = 'count()',
)
I have searched Altair's docs but all their explanations and sample charts (that I could find) just said bin = True with no further modification.
Appreciate any pointers :)

As demonstrated briefly in the Bin transforms section of the documentation, you can pass an alt.Bin() instance to fine-tune the binning parameters.
The equivalent of your matplotlib histogram would be something like this:
alt.Chart(my_numbers_df).mark_bar().encode(
alt.X("Integers", bin=alt.Bin(extent=[-4, 4], step=0.5)),
y='count()',
)

Tracing functions in python

I was searching about how to trace function graphs, but not only linear ones, I know how to plot with simple points, they are the linear ones like this one below:
import numpy
import matplotlib.pyplot as plt
%matplotlib inline
_=plt.plot([4,7],[5,7],color ='w')
_=plt.plot([4,7],[7,7],color ='w')
ax = plt.gca()
ax.set_facecolor('xkcd:red')
plt.show()
then after a bit of searching, I've found this code:
import pylab
import numpy
x = numpy.linspace(-15,15,100) # 100 linearly spaced numbers
y = numpy.sin(x)/x # computing the values of sin(x)/x
# compose plot
pylab.plot(x,y) # sin(x)/x
pylab.plot(x,y,'co') # same function with cyan dots
pylab.plot(x,2*y,x,3*y) # 2*sin(x)/x and 3*sin(x)/x
pylab.show() # show the plot
That works perfectly! But what I'm wondering is: do we really need to use standard functions that have defined by Numpy?( like sin(x)/x here ) Or can we define a function ourselves and use it in Numpy function too, like x**3?

This solved issue, Thanks FlyingTeller
An example of y=x**3 graph:
import pylab
import numpy
x = numpy.linspace(-15,15,100) # 100 linearly spaced numbers
y = x**3 # we change this to tracer graphs as we want
# compose plot
pylab.plot(x,y)
pylab.show()

imagesc like feature with non-rectangular grids [MATLAB]

If i want to color a square grid with different color in each grid cells, then it is possible in MATLAB with a simple call to imagesc command like here.
What if i want to color different cells in a grid like this:
Is this functionality available by default in either python or Matlab? I tried discretizing this grid with very small square cells. And then color each cell. That works. But it seems ordinary. Is there a smarter way to get his done?

In python, there is the builtin polar projection for the axes. This projection allows you to automatically use almost every plotting method in polar coordinates. In particular, you need to you pcolor or pcolormesh as follows
import numpy as np
from matplotlib import pyplot as plt
r = np.linspace(0,4,5)
theta = np.linspace(0,2*np.pi,10)
theta,r = np.meshgrid(theta,r)
values = np.random.rand(*(theta.shape))
ax = plt.subplot(111,polar=True)
ax.pcolor(theta,r,values)
plt.show()
Note that this will produce a plot like this
which is almost what you want. The obvious problem is that the patch vertices are joined by straight lines and not lines that follow the circle arc. You can solve this by making the angles array denser. Here is a posible way to do it.
import numpy as np
from matplotlib import pyplot as plt
r = np.linspace(0,4,5)
theta = np.linspace(0,2*np.pi,10)
values = np.random.rand(r.size,theta.size)
dense_theta = np.linspace(0,2*np.pi,100)
v_indeces = np.zeros_like(dense_theta,dtype=np.int)
i = -1
for j,dt in enumerate(dense_theta):
if dt>=theta[i+1]:
i+=1
v_indeces[j] = i
T,R = np.meshgrid(dense_theta,r)
dense_values = np.zeros_like(T)
for i,v in enumerate(values):
for j,ind in enumerate(v_indeces):
dense_values[i,j] = v[ind]
ax = plt.subplot(111,polar=True)
ax.pcolor(T,R,dense_values)
plt.show()
Which would produce
I am not aware of a way to do this in matlab but I googled around and found this that says it can produce pcolor plots in polar coordinates. You should check it out.

How to extract x,y data from kdensity plot from matplotlib for python

I am trying to figure out how to make a 3d figure of uni-variate kdensity plots as they change over time (since they pull from a sliding time window of data over time).
Since I can't figure out how to do that directly, I am first trying to get the x,y plotting data for kdensity plots of matplotlib in python. I hope after I extract them I can use them along with a time variable to make a three dimensional plot.
I see several posts telling how to do this in Matlab. All reference getting Xdata and Ydata from the underlying figure:
x=get(h,'Xdata')
y=get(h,'Ydata')
How about in python?

The answer was already contained in another thread (How to create a density plot in matplotlib?). It is pretty easy to get a set of kdensity x's and y's from a set of data.
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gaussian_kde
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8 # data is a set of univariate data
xs = np.linspace(0,max(data),200) # This 200 sets the # of x (and so also y) points of the kdensity plot
density = gaussian_kde(data)
density.covariance_factor = lambda : .25
density._compute_covariance()
ys = density(xs)
plt.plot(xs,ys)
And there you have it. Both the kdensity plot and it's underlying x,y data.

Not sure how kdensity plots work, but note that matplotlib.pyplot.plot returns a list of the added Line2D objects, which are, in fact, where the X and Y data are stored. I suspect they did that to make it work similarly to MATLAB.
import matplotlib.pyplot as plt
h = plt.plot([1,2,3],[2,4,6]) # [<matplotlib.lines.Line2D object at 0x021DA9F0>]
x = h[0].get_xdata() # [1,2,3]
y = h[0].get_ydata() # [2,4,6]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Matplotlib: plotting histogram with overlapping boundaries removed - python

Related

Radial heatmap from similarity matrix in Python

Controlling Bin Widths in Altair

Tracing functions in python

imagesc like feature with non-rectangular grids [MATLAB]

How to extract x,y data from kdensity plot from matplotlib for python

Categories

Resources