I have a Pandas dataframe like this -
The value in column 's' is the accuracy of a model for the corresponding values of 'k' and 'w'. So, it's strictly between 0 and 1.
I want to plot a heatmap such that a 7x2 grid of k (7 values) along X axis and w (2 values) along y axis will be created and the corresponding cell will be colored depending on the value of s.
I tried Seaborn's headmap function but it doesn't let me define which column to use to color the grid.
Try:
import matplotlib.pyplot as plt
import seaborn as sns
flights = df.pivot("w", "k", "s")
f, ax = plt.subplots(figsize=(9, 6))
sns.heatmap(flights, annot=True, linewidths=.5, ax=ax)
plt.show()
Related
I am new to using histogram in Python
I want to display 11 histograms by selecting 11 columns from data frame.
Instead of using a subplot, can I have xlabel, ylabel, legend and title for each of these histograms?
df.hist(column=['c1','c2','c3',.......'c11'], figsize=(20,20))
All the columns have different scales.
If can not be done using one hist() how can I do it using subplot?
In this case df.hist() returns a 2D array of axes (each ax refers to one of the subplots). You can iterate through these axes and set individual xlabels, ylabels, titles and legends.
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
df = pd.DataFrame({f'c{i}': np.random.randn(100) * i for i in range(1, 12)})
axes = df.hist(column=[f'c{i}' for i in range(1, 12)], figsize=(20, 20), label='histogram')
for i, ax in enumerate(axes.ravel(), start=1):
if i <= 11:
ax.set_xlabel(f'xlabel for column c{i}')
ax.set_ylabel(f'ylabel for column c{i}')
ax.set_title(f'title for column c{i}')
ax.legend(title=f'legend for column c{i}')
plt.show()
I took data from excel and plotted it. The first column is date, while the next two columns are prices of different indexes.
I managed to plot them, but they are on separate graphs. I need them plotted against each other with one y-axis (date) and two x-axis.
Also, I can't figure out how to make the line dotted for one and a diamond marker for the other.
import matplotlib.pyplot as plt
import pandas as pd
excel_data = pd.read_excel('Python_assignment_InputData.xlsx', '^GSPTSE')
excel_data.plot(kind='line', x = 'Date', y = 'Bitcoin CAD (BTC-CAD)', color = 'green')
excel_data.plot(kind='line', x = 'Date', y = 'S&P/TSX Composite index (^GSPTSE)', color = 'blue')
plt.show()
I expect Bitcoin and S%P prices to be on one y axis, with dates being on the x axis.
I am providing a sample answer using the iris DataFrame from seaborn. You can modify it to your needs. What you need is a single x axis and two y-axes.
import seaborn as sns
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
iris = sns.load_dataset("iris")
iris.plot(x='sepal_length', y='sepal_width', linestyle=':', ax=ax)
iris.plot(x='petal_length', y='petal_width', marker='d',
linestyle='None', secondary_y=True, ax=ax)
I have made a simple scatterplot using matplotlib showing data from 2 numerical variables (varA and varB) with colors that I defined with a 3rd categorical string variable (col) containing 10 unique colors (corresponding to another string variable with 10 unique names), all in the same Pandas DataFrame with 100+ rows.
Is there an easy way to create a legend for this scatterplot that shows the unique colored dots and their corresponding category names? Or should I somehow group the data and plot each category in a subplot to do this? This is what I have so far:
import matplotlib.pyplot as plt
from matplotlib import colors as mcolors
varA = df['A']
varB = df['B']
col = df['Color']
plt.scatter(varA,varB, c=col, alpha=0.8)
plt.legend()
plt.show()
I had to chime in, because I could not accept that I needed a for-loop to accomplish this. It just seems really annoying and unpythonic - especially when I'm not using Pandas. However, after some searching, I found the answer. You just need to import the 'collections' package so that you can access the PathCollections class and specifically, the legend_elements() method. See implementation below:
# imports
import matplotlib.collections
import numpy as np
# create random data and numerical labels
x = np.random.rand(10,2)
y = np.random.randint(4, size=10)
# create list of categories
labels = ['type1', 'type2', 'type3', 'type4']
# plot
fig, ax = plt.subplots()
scatter = ax.scatter(x[:,0], x[:,1], c=y)
handles, _ = scatter.legend_elements(prop="colors", alpha=0.6) # use my own labels
legend1 = ax.legend(handles, labels, loc="upper right")
ax.add_artist(legend1)
plt.show()
scatterplot legend with custom labels
Source:
https://matplotlib.org/stable/gallery/lines_bars_and_markers/scatter_with_legend.html
https://matplotlib.org/stable/api/collections_api.html#matplotlib.collections.PathCollection.legend_elements
Considering, Color is the column that has all the colors and labels, you can simply do following.
colors = list(df['Color'].unique())
for i in range(0 , len(colors)):
data = df.loc[df['Color'] == colors[i]]
plt.scatter('A', 'B', data=data, color='Color', label=colors[i])
plt.legend()
plt.show()
A simple way is to group your data by color, then plot all of the data on one plot. Pandas has a built in groupby function. For example:
import matplotlib.pyplot as plt
from matplotlib import colors as mcolors
for color, group in df.groupby(['Color']):
plt.scatter(group['A'], group['B'], c=color, alpha=0.8, label=color)
plt.legend()
plt.show()
Notice that we call plt.scatter once for each grouping of data. Then we only need to call plt.legend and plt.show once all of the data is in our plot.
I have a pandas dataframe where I am plotting two columns out the 12, one as the x-axis and one as the y-axis. The x-axis is simply a time series and the y-axis are values are random integers between -5000 and 5000 roughly.
Is there any way to make a scatter plot using only these 2 columns where the positive values of y are a certain color and the negative colors are another color?
I have tried so many variations but can't get anything to go. I tried diverging color maps, colormeshs, using seaborn colormaps and booleans masks for neg/positive numbers. I am at my wits end.
The idea to use a colormap to colorize the points of a scatter is of course justified. If you're using the plt.scatter plot, you can supply the values according to which the colormap chooses the color in the c argument.
Here you only want two values, so c= np.sign(df.y) would be an appropriate choice.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame({'x': np.arange(25), 'y': np.random.normal(0,2500,25)})
fig, ax = plt.subplots()
ax.scatter(df.x, df.y, c=np.sign(df.y), cmap="bwr")
plt.show()
Split dataframe and plot them separately:
import matplotlib.pylab as plt
import numpy as np
import pandas as pd
df = pd.DataFrame({'x': np.arange(20), 'y': np.random.randn(20)})
# split dataframes
df_plus = df[df.y >= 0]
df_minus = df[df.y < 0]
print df_plus
print df_minus
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
# plot scatter
ax.scatter(df_plus.x, df_plus.y, color='r')
ax.scatter(df_minus.x, df_minus.y, color='b')
ax.autoscale()
plt.show()
If you want plot negative datframe as positive write df.minus.y = -df_minus.y.
Make 2 separate dataframes by using boolean masking and the where keyword. The condition would be if >0 or not. Then plot both datframes one by one ,one top of the other, with different parameters for the color.
I want to make a subplot for a heatmap where the y-axis matches that of the heatmap (features), but the x axis is some transformation of the mean of the binned values represented for each feature in the heatmap. Below is an example figure:
I can make the heatmap already using imshow, and I have an array of transformed means for each feature with indices that match the heatmap array. How can I produce the subplot on the right of my example figure?
The two main things are setting up the axes to share the y-metric (sharey=True) and (as you have) setting up your the transformed data to use the same indices:
import matplotlib.pyplot as plt
from numpy.random import random
from numpy import var
H = random(size=(120,80))
Hvar = var(H, axis=1)
fig, axs = plt.subplots(figsize=(3,3), ncols=2, sharey=True, sharex=False)
plt.sca(axs[0])
plt.imshow(H) #heatmap into current axis
axs[0].set_ylim(0,120)
axs[1].scatter(Hvar, range(len(Hvar)))
plt.show()