Controlling Bin Widths in Altair - python

I have a set of numbers that I'd like to plot on a histogram.
Say:
import numpy as np
import matplotlib.pyplot as plt
my_numbers = np.random.normal(size = 1000)
plt.hist(my_numbers)
If I want to control the size and range of the bins I could do this:
plt.hist(my_numbers, bins=np.arange(-4,4.5,0.5))
Now, if I want to plot a histogram in Altair the code below will do, but how do I control the size and range of the bins in Altair?
import pandas as pd
import altair as alt
my_numbers_df = pd.DataFrame.from_dict({'Integers': my_numbers})
alt.Chart(my_numbers_df).mark_bar().encode(
alt.X("Integers", bin = True),
y = 'count()',
)
I have searched Altair's docs but all their explanations and sample charts (that I could find) just said bin = True with no further modification.
Appreciate any pointers :)

As demonstrated briefly in the Bin transforms section of the documentation, you can pass an alt.Bin() instance to fine-tune the binning parameters.
The equivalent of your matplotlib histogram would be something like this:
alt.Chart(my_numbers_df).mark_bar().encode(
alt.X("Integers", bin=alt.Bin(extent=[-4, 4], step=0.5)),
y='count()',
)

Related

Creating a 2 colour heatmap with Python

I have numerous sets of seasonal data that I am looking to show in a heatmap format. I am not worried about the magnitude of the values in the dataset but more the overall direction and any patterns that i can look at in more detail later. To do this I want to create a heatmap that only shows 2 colours (red for below zero and green for zero and above).
I can create a normal heatmap with seaborn but the normal colour maps do not have only 2 colours and I am not able to create one myself. Even if I could I am unable to set the parameters to reflect the criteria of below zero = red and zero+ = green.
I managed to create this simply by styling the dataframe but I was unable to export it as a .png because the table_criteria='matplotlib' option removes the formatting.
Below is an example of what I would like to create made from random data, could someone help or point me in the direction of a helpful Stackoverflow answer?
I have also included the code I used to style and export the dataframe.
Desired output - this is created with random data in an Excel spreadsheet
#Code to create a regular heatmap - can this be easily amended?
df_hm = pd.read_csv(filename+h)
pivot = df_hm.pivot_table(index='Year', columns='Month', values='delta', aggfunc='sum')
fig, ax = plt.subplots(figsize=(10,5))
ax.set_title('M1 '+h[:-7])
sns.heatmap(pivot, annot=True, fmt='.2f', cmap='RdYlGn')
plt.savefig(chartpath+h[:-7]+" M1.png", bbox_inches='tight')
plt.close()
#code used to export dataframe that loses format in the .png
import matplotlib.pyplot as plt
import dataframe_image as dfi
#pivot is the dateframe name
pivot = pd.DataFrame(np.random.randint(-100,100,size= (5, 12)),columns=list ('ABCDEFGHIJKL'))
styles = [dict(selector="caption", props=[("font-size", "120%"),("font-weight", "bold")])]
pivot = pivot.style.format(precision=2).highlight_between(left=-100000, right=-0.01, props='color:white;background-color:red').highlight_between(left=0, right= 100000, props='color:white;background-color:green').set_caption(title).set_table_styles(styles)
dfi.export(pivot, root+'testhm.png', table_conversion='matplotlib',chrome_path=None)
You can manually set cmap property to list of colors and if you want to annotate you can do it and it will show same value as it's not converted to -1 or 1.
import numpy as np
import seaborn as sns
arr = np.random.randn(10,10)
sns.heatmap(arr,cmap=["grey",'green'],annot=True,center=0)
# center will make it dividing point
Output:
PS. If you don't want color-bar you can pass cbar=False in `sns.heatmap)
Welcome to SO!
To achieve what you need, you just need to pass delta through the sign function, here's an example code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
arr = np.random.randn(25,25)
sns.heatmap(np.sign(arr))
Which results in a binary heatmap, albeit one with a quite ugly colormap, still, you can fiddle around with Seaborn's colormaps in order to make it look like excel.

Altair chart: Show less lines in the grid

I'm working on a chart using Altair, and I'm trying to figure out how to have less lines in the background grid. Is there a term for that background grid?
Here's a chart that looks like mine, that I took from the tutorial:
Let's say that I want to have half as many grid lines on the X axis. How could I do that?
Grid lines are drawn at the location of ticks, so to adjust the grid lines you can adjust the ticks. For example:
import altair as alt
import numpy as np
import pandas as pd
x = np.arange(100)
source = pd.DataFrame({
'x': x,
'f(x)': np.sin(x / 5)
})
alt.Chart(source).mark_line().encode(
x=alt.X('x', axis=alt.Axis(tickCount=4)),
y='f(x)'
)
You can see other tick-related properties in the documentation for alt.Axis.

Python Matplotlib: plotting histogram with overlapping boundaries removed

I am plotting a histogram using Matplotlib in Python with the matplotlib.bar() function. This gives me plots that look like this:
I am trying to produce a histogram that only plots the caps of each bar and the sides that don't directly share space with the border of another bar, more like this: (I edited this using gimp)
How can I achieve this using Python? Answers using matplotlib are preferable since that is what I have the most experience with but I am open to anything that works using Python.
For what it's worth, here's the relevant code:
import numpy as np
import matplotlib.pyplot as pp
bin_edges, bin_values = np.loadtxt("datafile.dat",unpack=True)
bin_edges = np.append(bin_edges,500.0)
bin_widths = []
for j in range(len(bin_values)):
bin_widths.append(bin_edges[j+1] - bin_edges[j])
pp.bar(bin_edges[:-1],bin_values,width=bin_widths,color="none",edgecolor='black',lw=2)
pp.savefig("name.pdf")
I guess the easiest way is to use the step function instead of bar:
http://matplotlib.org/examples/pylab_examples/step_demo.html
Example:
import numpy as np
import matplotlib.pyplot as pp
# Simulate data
bin_edges = np.arange(100)
bin_values = np.exp(-np.arange(100)/5.0)
# Prepare figure output
pp.figure(figsize=(7,7),edgecolor='k',facecolor='w')
pp.step(bin_edges,bin_values, where='post',color='k',lw=2)
pp.tight_layout(pad=0.25)
pp.show()
If your bin_edges given represent the left edge use where='post'; if they are the rightern side use where='pre'. The only issue I see is that step doesn't really plot the last (first) bin correctly if you use post (pre). But you could just add another 0 bin before/after your data to make it draw everything properly.
Example 2 - If you want to bin some data and draw a histogram you could do something like this:
# Simulate data
data = np.random.rand(1000)
# Prepare histogram
nBins = 100
rng = [0,1]
n,bins = np.histogram(data,nBins,rng)
x = bins[:-1] + 0.5*np.diff(bins)
# Prepare figure output
pp.figure(figsize=(7,7),edgecolor='k',facecolor='w')
pp.step(x,n,where='mid',color='k',lw=2)
pp.show()

Custom scale from simple list or dict?

I need to make a custom scale for an axis. Before diving into http://matplotlib.org/examples/api/custom_scale_example.html, I'm wondering if there is an easier way for my special case.
A picture is worth a thousand words, so here we go:
See the value in each row next to the filename ? I would like the row height to be relative to the difference between it and the previous one. I'd start from 0 and would have to define a top limit so I see the last row.
Try matplotlib's pcolormesh with which you can create irregularly shaped grids.
from matplotlib import pyplot as plt
import numpy as np
y1D = np.hstack([0, np.random.random(9)])
y1D = np.sort(y1D)/np.max(y1D)
x, y = np.meshgrid(np.arange(0,1.1,0.1),y1D)
plt.pcolormesh(x,y, np.random.random((10,10)))
plt.show()
You can use this recipe and adapt to your needs:
import numpy as np
import matplotlib.pyplot as plt
grid = np.zeros((20,20))
for i in range(grid.shape[0]):
r = np.random.randint(1,19)
grid[i,:r] = np.random.randint(10,30,size=(r,))
plt.imshow(grid,origin='lower',cmap='Reds',interpolation='nearest')
plt.yticks(list(range(20)),['File '+str(i) for i in range(20)])
plt.colorbar()
plt.show()
, the result is this:

Plot quartiles of data series in a matplotlib chart

I would like to illustrate the quartiles of a distribution sample with matplotlib. It is probably best explained by an example:
import matplotlib.pyplot as plt
import numpy as np
import random
x = sorted([random.randrange(0,n) for n in range(1,1000)])
median_y = np.median(x)
median_x = x.index(median)
plt.plot(x)
plt.plot((median_x,median_x), (0,median_y),'k:')
plt.plot((0,median_x), (median_y,median_y),'k:')
Do you see a more convenient way to add quartiles 1,2 (median), and 3 than my clumsy solution? I could not find any command to plot a point with helper lines like this. And how could I add numbers to the points or axes?

Categories