Pandas DataFrame Plot: Permanently change default colormap - python

For extensive plotting scripts, I use matplotlibs rcParams to configure some standard plot settings for pandas DataFrames.
This works well for colors and font sizes but not for the default colormap as described here
Here's my current approach:
# -*- coding: utf-8 -*-
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from matplotlib import cm
# global plotting options
plt.rcParams.update(plt.rcParamsDefault)
matplotlib.style.use('ggplot')
plt.rcParams['lines.linewidth'] = 2.5
plt.rcParams['axes.facecolor'] = 'silver'
plt.rcParams['xtick.color'] = 'k'
plt.rcParams['ytick.color'] = 'k'
plt.rcParams['text.color'] = 'k'
plt.rcParams['axes.labelcolor'] = 'k'
plt.rcParams.update({'font.size': 10})
plt.rcParams['image.cmap'] = 'Blues' # this doesn't show any effect
# dataframe with random data
df = pd.DataFrame(np.random.rand(10, 3))
# this shows the standard colormap
df.plot(kind='bar')
plt.show()
# this shows the right colormap
df.plot(kind='bar', cmap=cm.get_cmap('Blues'))
plt.show()
The first plot does not use the colormap via colormap (which it should normally do?):
It only works if I pass it as an argument as in the second plot:
Is there any way to define the standard colormap for pandas DataFrame plots, permanently?
Thanks in advance!

There is no supported, official way to do it; you are stuck because of pandas's internal _get_standard_colors function that hardcodes the use of matplotlib.rcParams['axes.color_cycle'] and falls back to list('bgrcmyk'):
colors = list(plt.rcParams.get('axes.color_cycle',
list('bgrcmyk')))
There are various hacks you can use, however; one of the simplest, which works for all pandas.DataFrame.plot() calls, is to wrap pandas.tools.plotting.plot_frame:
import matplotlib
import pandas as pd
import pandas.tools.plotting as pdplot
def plot_with_matplotlib_cmap(*args, **kwargs):
kwargs.setdefault("colormap", matplotlib.rcParams.get("image.cmap", "Blues"))
return pdplot.plot_frame_orig(*args, **kwargs)
pdplot.plot_frame_orig = pdplot.plot_frame
pdplot.plot_frame = plot_with_matplotlib_cmap
pd.DataFrame.plot = pdplot.plot_frame
To test in a notebook:
%matplotlib inline
import pandas as pd, numpy as np
df = pd.DataFrame(np.random.random((1000,10))).plot()
...yields:

Related

how can I use pandas to plot the graph?

If I have this length.csv file content:
May I know how can I use pandas plot dot graph base on this xy and yx?
import pandas as pd
df = pd.read_csv('C:\\path\to\folder\length.csv')
Now if you print df, you will get the following
df.plot(x='yx', y='xy', kind='scatter')
You can change your plot type to different types like line, bar etc.
Refer to https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html
You can easily use matplotlib. The plot method in Pandas is a wrapper for matplotlib.
If you wish to use Pandas, you can do it as such:
import pandas as pd
df = pd.read_csv('length.csv')
df.plot(x='xy', y='yx')
If you decide to go ahead with matplotlib, you can do as follows:
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline # Include this line only if on a notebook (like Jupyter or Colab)
df = pd.read_csv('length.csv')
plt.plot(df['xy'], df['yx'])
plt.xlabel('xy')
plt.ylabel('yx')
plt.title('xy vs yx Plot')
plt.show()

Seaborn Scatterplot X Values Missing

I have a scatter plot im working with and for some reason im not seeing all the x values on my graph
#%%
from pandas import DataFrame, read_csv
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
file = r"re2.csv"
df = pd.read_csv(file)
#sns.set(rc={'figure.figsize':(11.7,8.27)})
g = sns.FacetGrid(df, col='city')
g.map(plt.scatter, 'type', 'price').add_legend()
This is an image of a small subset of my plots, you can see that Res is displaying, the middle bar should be displaying Con and the last would be Mlt. These are all defined in the type column from my data set but are not displaying.
Any clue how to fix?
Python is doing what you tell it to do. Just pick different features, presumably things that make more sense for plotting, if you want to generate a more interesting plots. See this generic example below.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme(style="darkgrid")
tips = sns.load_dataset("tips")
sns.relplot(x="total_bill", y="tip", hue="smoker", data=tips);
Personally, I like plotly plots, which are dynamic, more than I like seaborn plots.
https://plotly.com/python/line-and-scatter/

Clustermapping in Python using Seaborn

I am trying to create a heatmap with dendrograms on Python using Seaborn and I have a csv file with about 900 rows. I'm importing the file as a pandas dataframe and attempting to plot that but a large number of the rows are not being represented in the heatmap. What am I doing wrong?
This is the code I have right now. But the heatmap only represents about 49 rows.
Here is an image of the clustermap I've obtained but it is not displaying all of my data.
import seaborn as sns
import pandas as pd
from matplotlib import pyplot as plt
# Data set
df = pd.read_csv('diff_exp_gene.csv', index_col = 0)
# Default plot
sns.clustermap(df, cmap = 'RdBu', row_cluster=True, col_cluster=True)
plt.show()
Thank you.
An alternative approach would be to use imshow in matpltlib. I'm not exactly sure what your question is but I demonstrate a way to graph points on a plane from csv file
import numpy as np
import matplotlib.pyplot as plt
import csv
infile = open('diff_exp_gene.csv')
df = csv.DictReader(in_file)
temp = np.zeros((128,128), dtype = int)
for row in data:
if row['TYPE'] == types:
temp[int(row['Y'])][int(row['X'])] = temp[int(row['Y'])][int(row['X'])] + 1
plt.imshow(temp, cmap = 'hot', origin = 'lower')
plt.show()
As far as I know, keywords that apply to seaborn heatmaps also apply to clustermap, as the sns.clustermap passes to the sns.heatmap. In that case, all you need to do in your example is to set yticklabels=True as a keyword argument in sns.clustermap(). That will make all of the 900 rows appear.
By default, it is set as "auto" to avoid overlap. The same applies to the xticklabels. See more here: https://seaborn.pydata.org/generated/seaborn.heatmap.html

get the date format on a Matplotlib plot's x-axis

I generate a plot using the following code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
index=pd.date_range('2018-01-01',periods=200)
data=pd.Series(np.random.randn(200),index=index)
plt.figure()
plt.plot(data)
Which gives me a plot, looking as follows:
It looks like Matplotlib has decided to format the x-ticks as %Y-%m (source)
I am looking for a way to retrieve this date format. A function like ax.get_xtickformat(), which would then return %Y-%m. Which is the smartest way to do this?
There is no built-in way to obtain the date format used to label the axes. The reason is that this format is determined at drawtime and may even change as you zoom in or out of the plot.
However you may still determine the format yourself. This requires to draw the figure first, such that the ticklocations are fixed. Then you may query the formats used in the automatic formatting and select the one which would be chosen for the current view.
Note that the following assumes that an AutoDateFormatter or a formatter subclassing this is in use (which should be the case by default).
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
index=pd.date_range('2018-01-01',periods=200)
data=pd.Series(np.random.randn(200),index=index)
plt.figure()
plt.plot(data)
def get_fmt(axis):
axis.axes.figure.canvas.draw()
formatter = axis.get_major_formatter()
locator_unit_scale = float(formatter._locator._get_unit())
fmt = next((fmt for scale, fmt in sorted(formatter.scaled.items())
if scale >= locator_unit_scale),
formatter.defaultfmt)
return fmt
print(get_fmt(plt.gca().xaxis))
plt.show()
This prints %Y-%m.
If you want to edit the format of the date in myFmt = DateFormatter("%d-%m-%Y"):
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
index=pd.date_range('2018-01-01',periods=200)
data=pd.Series(np.random.randn(200),index=index)
fig, ax = plt.subplots()
ax.plot(index, data)
myFmt = DateFormatter("%d-%m-%Y")
ax.xaxis.set_major_formatter(myFmt)
fig.autofmt_xdate()
plt.show()

Using a colormap for a pandas Series

I have pandas series of complex numbers, which I would like to plot. Currently, I am looping through each point and assigning it a color. I would prefer to generate the plot without the need to loop over each point... Using Series.plot() would be preferable. Converting series to numpy is ok though.
Here is an example of what I currently have:
import pandas as pd
import numpy as np
from matplotlib import pyplot
s = pd.Series((1+np.random.randn(500)*0.05)*np.exp(1j*np.linspace(-np.pi, np.pi, 500)))
cmap = pyplot.cm.viridis
for i, val in enumerate(s):
pyplot.plot(np.real(val), np.imag(val), 'o', ms=10, color=cmap(i/(len(s)-1)))
pyplot.show()
You can use pyplot.scatter, which allows coloring of points based on a value.
pyplot.scatter(np.real(s), np.imag(s), s=50, c=np.arange(len(s)), cmap='viridis')
Here, we set c to an increasing sequence to get the same result as in the question.
You can simply plot the real and imaginary part of the series without a loop.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
s = pd.Series((1+np.random.randn(500)*0.05)*np.exp(1j*np.linspace(-np.pi, np.pi, 500)))
plt.plot(s.values.real,s.values.imag, marker="o", ls="")
plt.show()
However, you need to use a scatter plot if you want to have different colors:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
s = pd.Series((1+np.random.randn(500)*0.05)*np.exp(1j*np.linspace(-np.pi, np.pi, 500)))
plt.scatter(s.values.real,s.values.imag, c = range(len(s)), cmap=plt.cm.viridis)
plt.show()

Categories