Changing the axis of a seaborn heatmap - python

I have a set of Cartesian coordinates pairs, along with a binary variable for each of the pairs. I am plotting a heatmap, where in each bin, I compute the fraction of coordinates falling into this bin where the binary variable is 1.
My problem is with the axis. As can be seen in the picture below, the resulting axis are strings, that stand for bin boundaries. I would like the axis to be Cartesian coordinates. Is there a simple way to change this?
import numpy as np
import pandas as pd
import seaborn as sb
np.random.seed(0)
x = np.random.uniform(0,100, size=200)
y = np.random.uniform(0,100, size=200)
z = np.random.choice([True, False], size=200, p=[0.3, 0.7])
df = pd.DataFrame({"x" : x, "y" : y, "z":z})
binsx = 8
binsy = 5
res = df.groupby([pd.cut(df.y, binsy),pd.cut(df.x,binsx)])['z'].mean().unstack()
ax = sb.heatmap(res)
ax.axis('equal')
ax.invert_yaxis()

The following creates a scale by using the bins for histogramming as the extents of the image.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
np.random.seed(0)
x = np.random.uniform(0,100, size=200)
y = np.random.uniform(0,100, size=200)
z = np.random.choice([True, False], size=200, p=[0.3, 0.7])
df = pd.DataFrame({"x" : x, "y" : y, "z":z})
binsx = np.arange(0,112.5,12.5)
binsy = np.arange(0,120,20)
res = df.groupby([pd.cut(df.y, binsy),pd.cut(df.x,binsx)])['z'].mean().unstack()
plt.imshow(res, cmap=plt.cm.Reds,
extent=[binsx.min(), binsx.max(),binsy.min(),binsy.max()])
plt.xticks(binsx)
plt.yticks(binsy)
plt.colorbar()
plt.grid(False)
plt.show()

Related

Scatter plot markers color based on custom scale

I want to color my scatter points based on a custom color scale but I got this error
ValueError: 'c' argument has 150 elements, which is inconsistent with 'x' and 'y' with size 100.
For the example below, it seems like the length of t needs to be the same as x and y.
However, I want to color the points with a wider scale, for example -50 to 150 instead of 0 to 100.
How can I do this?
Thanks
import numpy as np
import matplotlib.pyplot as plt
x = np.random.rand(100)
y = np.random.rand(100)
t = np.arange(100)
plt.scatter(x, y, c=t)
plt.show()
I'm not really sure what your goal is, so I present you two answers: hopefully one is the one you are looking for.
First:
Simply create an array t of 100 elements ranging from -50 to 150:
import numpy as np
import matplotlib.pyplot as plt
x = np.random.rand(100)
y = np.random.rand(100)
t = np.linspace(-50, 150, len(x))
fig, ax = plt.subplots()
sc = ax.scatter(x, y, c=t)
fig.colorbar(sc, label="value")
plt.show()
Second:
Create an array t with 100 elements, ranging from 0 to 100. Asks matplotlib to visualize this colors in the range from -50 to 150.
import numpy as np
import matplotlib.pyplot as plt
x = np.random.rand(100)
y = np.random.rand(100)
t = np.arange(100)
fig, ax = plt.subplots()
sc = ax.scatter(x, y, c=t, vmin=-50, vmax=150)
fig.colorbar(sc, label="value")
plt.show()

How to do a 3D plot in matplotlib where two functions are evaluated with only one variable?

All 3D plot examples I find are of the form X,Y = data and f(X,Y) is the plotted function. What I need is a 3D plot where X contains data points while Y = f(X) and Z = g(X) are displayed on the respective axes. Is there a way to do so?
IIUC, you can set x_points on DataFarme then apply function_y and function_z on DataFrame then plot points like below:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
def create_Y(x):
return x**2 + 2*x - 10
def create_Z(x):
return np.sin(x)
df = pd.DataFrame({'X': np.linspace(0.0, 3.0, num=500)})
df['Y'] = df['X'].apply(create_Y)
df['Z'] = df['X'].apply(create_Z)
fig = plt.figure(figsize=(12,7))
ax = fig.add_subplot(projection = '3d')
ax.plot(xs = df['X'], ys = df['Y'],zs = df['Z'])
plt.show()
Output:

matplotlib: Plotting the path in 3D axis

I would like to plot the path based on x y z location data. Below is a reproducible example, all the lines keep starting from 0 instead of following one after each other.
import seaborn as sns
# loading sample data and replicating my scenario
data = sns.load_dataset("iris")
# giving it a numeric value to replicate my scenario
cat_lbl = {'setosa': 1, 'versicolor': 2,'virginica' : 3}
data['cat_lbl'] = data['species'].map(cat_lbl)
#plot headings
species = ['setosa', 'versicolor', 'virginica']
import matplotlib as mpl
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
sepal_length = data.loc[:,['sepal_length','cat_lbl']]
sepal_width = data.loc[:,['sepal_width','cat_lbl']]
petal_length = data.loc[:,['petal_length','cat_lbl']]
fig = plt.figure(figsize=([20,15]))
for lbl in range(3):
lbl=lbl+1
x=sepal_length[(sepal_length.cat_lbl == lbl)].values
y=sepal_width[(sepal_width.cat_lbl == lbl)].values
z=petal_length[(petal_length.cat_lbl == lbl)].values
ax=fig.add_subplot(3,3,lbl, projection='3d')
ax.plot(x.flatten(),y.flatten(),z.flatten())
ax.set_title(species[lbl-1])
plt.show()
Your problem is that
x=sepal_length[(sepal_length.cat_lbl == lbl)].values
y=sepal_width[(sepal_width.cat_lbl == lbl)].values
z=petal_length[(petal_length.cat_lbl == lbl)].values
are actually 2D arrays that contain the category index (1,2,3). So when you flatten x.flatten(), you alternate between the coordinate and the category index (you can see that the lines actually loop back to (1,1) on the first graph, (2,2) on the second and (3,3) on the third)
Here is how I would write your code:
import seaborn as sns
import matplotlib as mpl
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
data = sns.load_dataset("iris")
species = ['setosa', 'versicolor', 'virginica']
fig,axs = plt.subplots(1,3,subplot_kw=dict(projection='3d'),figsize=(9,3))
for sp,ax in zip(species, axs.flat):
temp = data.loc[data['species']==sp]
x=temp['sepal_length'].values
y=temp['sepal_width'].values
z=temp['petal_length'].values
ax.plot(x,y,z)
ax.set_title(sp)
plt.show()
Try ax.plot3D(...) instead of ax.plot(...) as indicated in this tutorial for 3D plotting:
ax = plt.axes(projection='3d')
# Data for a three-dimensional line
zline = np.linspace(0, 15, 1000)
xline = np.sin(zline)
yline = np.cos(zline)
ax.plot3D(xline, yline, zline, 'gray')
# Data for three-dimensional scattered points
zdata = 15 * np.random.random(100)
xdata = np.sin(zdata) + 0.1 * np.random.randn(100)
ydata = np.cos(zdata) + 0.1 * np.random.randn(100)
ax.scatter3D(xdata, ydata, zdata, c=zdata, cmap='Greens');

how to generate a series of histograms on matplotlib?

I would like to generate a series of histogram shown below:
The above visualization was done in tensorflow but I'd like to reproduce the same visualization on matplotlib.
EDIT:
Using plt.fill_between suggested by #SpghttCd, I have the following code:
colors=cm.OrRd_r(np.linspace(.2, .6, 10))
plt.figure()
x = np.arange(100)
for i in range(10):
y = np.random.rand(100)
plt.fill_between(x, y + 10-i, 10-i,
facecolor=colors[i]
edgecolor='w')
plt.show()
This works great, but is it possible to use histogram instead of a continuous curve?
EDIT:
joypy based approach, like mentioned in the comment of october:
import pandas as pd
import joypy
import numpy as np
df = pd.DataFrame()
for i in range(0, 400, 20):
df[i] = np.random.normal(i/410*5, size=30)
joypy.joyplot(df, overlap=2, colormap=cm.OrRd_r, linecolor='w', linewidth=.5)
for finer control of colors, you can define a color gradient function which accepts a fractional index and start and stop color tuples:
def color_gradient(x=0.0, start=(0, 0, 0), stop=(1, 1, 1)):
r = np.interp(x, [0, 1], [start[0], stop[0]])
g = np.interp(x, [0, 1], [start[1], stop[1]])
b = np.interp(x, [0, 1], [start[2], stop[2]])
return (r, g, b)
Usage:
joypy.joyplot(df, overlap=2, colormap=lambda x: color_gradient(x, start=(.78, .25, .09), stop=(1.0, .64, .44)), linecolor='w', linewidth=.5)
Examples with different start and stop tuples:
original answer:
You could iterate over your dataarrays you'd like to plot with plt.fill_between, setting colors to some gradient and the line color to white:
creating some sample data:
import numpy as np
t = np.linspace(-1.6, 1.6, 11)
y = np.cos(t)**2
y2 = lambda : y + np.random.random(len(y))/5-.1
plot the series:
import matplotlib.pyplot as plt
import matplotlib.cm as cm
colors = cm.OrRd_r(np.linspace(.2, .6, 10))
plt.figure()
for i in range(10):
plt.fill_between(t+i, y2()+10-i/10, 10-i/10, facecolor = colors[i], edgecolor='w')
If you want it to have more optimized towards your example you should perhaps consider providing some sample data.
EDIT:
As I commented below, I'm not quite sure if I understand what you want - or if you want the best for your task. Therefore here a code which plots besides your approach in your edit two smples of how to present a bunch of histograms in a way that they are better comparable:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.cm as cm
N = 10
np.random.seed(42)
colors=cm.OrRd_r(np.linspace(.2, .6, N))
fig1 = plt.figure()
x = np.arange(100)
for i in range(10):
y = np.random.rand(100)
plt.fill_between(x, y + 10-i, 10-i,
facecolor=colors[i],
edgecolor='w')
data = np.random.binomial(20, .3, (N, 100))
fig2, axs = plt.subplots(N, figsize=(10, 6))
for i, d in enumerate(data):
axs[i].hist(d, range(20), color=colors[i], label=str(i))
fig2.legend(loc='upper center', ncol=5)
fig3, ax = plt.subplots(figsize=(10, 6))
ax.hist(data.T, range(20), color=colors, label=[str(i) for i in range(N)])
fig3.legend(loc='upper center', ncol=5)
This leads to the following plots:
your plot from your edit:
N histograms in N subplots:
N histograms side by side in one plot:

Matplotllib: set "bad" colour in scatter plot

I'm having problems setting the color of np.nan values in my data set.
I already managed to get the camp.set_bad working in imshow plots but it does not work in plt.scatter.
Anyways, my main goal is assigning a specific colour to bad values.
This is how I though it would work (but it does not ;-)
import matplotlib.pyplot as plt
import numpy as np
n = 20
x = y = np.linspace(1, 10, n)
c = np.random.random_sample((n,))
c[4] = np.nan
c[8:12] = np.nan
cmap = plt.get_cmap('plasma')
cmap.set_bad(color='black', alpha = 1.)
plt.scatter(x, y, c=c, s=200, cmap=cmap)
This gives me the following output:
Of course, I could divide the dataset into two separate sets and overplot them, but I'm quite sure there is a much cleaner solution.
There is no black color in cmap plasma.
Array c has to store indexes of colors which your select from current color map cmap. If your set c as NaN it means you do not get a object for these indices (4 and 8:12) on a scatter-plot.
The first variant is to set color for selected indices manually:
import matplotlib.pyplot as plt
import numpy as np
n = 20
x = y = np.linspace(1, 10, n)
c = np.random.random_sample((n,))
#c[4] = np.nan
#c[8:12] = np.nan
c[4]=c[8:12]=0 # first color use to mark 4 and 8:12 elements
cmap = plt.get_cmap('plasma')
plt.scatter(x, y, s=200, c=c, cmap=cmap)
plt.show()
The second variant is to draw two scatter-plots:
import matplotlib.pyplot as plt
import numpy as np
n = 20
x = y = np.linspace(1, 10, n)
c = np.random.random_sample((n,))
c[4] = np.nan
c[8:12] = np.nan
cmap = plt.get_cmap('plasma')
# plot good values
indices = ~np.isnan(c)
plt.scatter(x[indices], y[indices], s=200, c=c[indices], cmap=cmap)
# plot bad values
plt.scatter(x[~indices], y[~indices], s=200, c='k')
plt.show()

Categories