Create heatmap using pandas TimeSeries - python

I need to create MatplotLib heatmap (pcolormesh) using Pandas DataFrame TimeSeries column (df_all.ts) as my X-axis.
How to convert Pandas TimeSeries column to something which can be used as X-axis in np.meshgrid(x, y) function to create heatmap? The workaround is to create Matplotlib drange using same parameters as in pandas column, but is there a simple way?
x = pd.date_range(df_all.ts.min(),df_all.ts.max(),freq='H')
xt = mdates.drange(df_all.ts.min(), df_all.ts.max(), dt.timedelta(hours=1))
y = arange(ylen)
X,Y = np.meshgrid(xt, y)

I do not know what you mean by heat map for a time series, but for a dataframe you may do as below:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from itertools import product
from string import ascii_uppercase
from matplotlib import patheffects
m, n = 4, 7 # 4 rows, 7 columns
df = pd.DataFrame(np.random.randn(m, n),
columns=list(ascii_uppercase[:n]),
index=list(ascii_uppercase[-m:]))
ax = plt.imshow(df, interpolation='nearest', cmap='Oranges').axes
_ = ax.set_xticks(np.linspace(0, n-1, n))
_ = ax.set_xticklabels(df.columns)
_ = ax.set_yticks(np.linspace(0, m-1, m))
_ = ax.set_yticklabels(df.index)
ax.grid('off')
ax.xaxis.tick_top()
optionally, to print actual values in the middle of each square, with some shadows for readability, you may do:
path_effects = [patheffects.withSimplePatchShadow(shadow_rgbFace=(1,1,1))]
for i, j in product(range(m), range(n)):
_ = ax.text(j, i, '{0:.2f}'.format(df.iloc[i, j]),
size='medium', ha='center', va='center',
path_effects=path_effects)

Related

Consistent color argument between matplotlib scatter to matplotlib plot?

I'm hoping to use matplotlib to plot inter-annual variation of monthly data (below). By passing c=ds['time.year'] in plt.scatter(), I achieve the desired outcome. However, I would like to be able to connect the points with an analogous plt.plot() call. Is this possible?
import pandas as pd
import matplotlib.pyplot as plt
import xarray as xr
# create y data
y = []
for yr in range(10):
for mo in range(12):
y.append(yr+mo+(yr*mo)**2)
# create datetime vector
t = pd.date_range(start='1/1/2010', periods=120, freq='M')
# combine in DataArray
ds = xr.DataArray(y, coords={'time':t}, dims=['time'])
# scatter plot with color
im = plt.scatter(ds['time.month'], ds.values, c=ds['time.year'])
plt.colorbar(im)
Output:
I have tried the following, but it does not work:
plt.plot(ds['time.month'], ds.values, c=ds['time.year'])
You can create a norm mapping the range of years to the range of colors. The norm together with the used colormap, can server as input for a ScalarMapple to create an accompanying colorbar. With the default 'viridis' colormap the code could look like:
import matplotlib.pyplot as plt
from matplotlib.cm import ScalarMappable
import pandas as pd
import xarray as xr
y = []
for yr in range(10):
for mo in range(12):
y.append(yr + mo + (yr * mo) ** 2)
t = pd.date_range(start='1/1/2010', periods=120, freq='M')
ds = xr.DataArray(y, coords={'time': t}, dims=['time'])
norm = plt.Normalize(ds['time.year'].min(), ds['time.year'].max())
cmap = plt.cm.get_cmap('viridis')
for year in range(int(ds['time.year'].min()), int(ds['time.year'].max()) + 1):
plt.plot(ds['time.month'][ds['time.year'] == year],
ds.values[ds['time.year'] == year],
ls='-', marker='o', color=cmap(norm(year)))
plt.colorbar(ScalarMappable(cmap=cmap, norm=norm))
plt.xticks(range(1, 13))
plt.show()

Stripplot color points based on date

I have a data which has various values of A, B, C and D based different dates, i want to make a stripplot of these points, such that data points of recent date should be shaded darker(or have more alpha value) compared data points of previous dates.
this is what i have right now, all i need is to shade the points based on date for each bucket. but i am not able to figure that out
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib as mlp
plt.style.use("ggplot")
data = pd.DataFrame({"Date":pd.date_range(start="2020-01-06", end="2020-08-10", freq="W-MON"),
"A":[np.random.randint(-5, 50) for _ in range(len(pd.date_range(start="2020-01-06", end="2020-08-10", freq="W-MON")))],
"B":[np.random.randint(-5, 50) for _ in range(len(pd.date_range(start="2020-01-06", end="2020-08-10", freq="W-MON")))],
"C":[np.random.randint(-10, 50) for _ in range(len(pd.date_range(start="2020-01-06", end="2020-08-10", freq="W-MON")))],
"D":[np.random.randint(9, 50) for _ in range(len(pd.date_range(start="2020-01-06", end="2020-08-10", freq="W-MON")))]})
data.set_index("Date", inplace=True)
data.head()
sns.catplot(data=data, aspect=15/6, height=6)
This is the result of the above code
A scatter plot with randomized x-displacements can be used to apply one colormap per column.
To illustrate the effect, the example below uses random data with the most recent values being the largest.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
plt.style.use("ggplot")
dates = pd.date_range(start="2020-01-06", end="2020-08-10", freq="W-MON")
N = len(dates)
data = pd.DataFrame({"Date": dates,
"A": 30 + np.random.uniform(-5, 8, N).cumsum(),
"B": 20 + np.random.uniform(-4, 9, N).cumsum(),
"C": 25 + np.random.uniform(-4, 7, N).cumsum(),
"D": 40 + np.random.uniform(-2, 8, N).cumsum()})
data.set_index("Date", inplace=True)
columns = data.columns
for col_id, (column, cmap) in enumerate(zip(columns, ['Reds', 'Blues', 'Greens', 'Purples'])):
plt.scatter(col_id + np.random.uniform(-0.2, 0.2, N), data[column], c=range(N), cmap=cmap)
plt.xticks(range(len(columns)), columns)
plt.show()

How to use pandas with matplotlib to create 3D plots

I am struggling a bit with the pandas transformations needed to make data render in 3D on matplot lib. The data I have is usually in columns of numbers (usually time and some value). So lets create some test data to illustrate.
import pandas as pd
pattern = ("....1...."
"....1...."
"..11111.."
".1133311."
"111393111"
".1133311."
"..11111.."
"....1...."
"....1....")
# create the data and coords
Zdata = list(map(lambda d:0 if d == '.' else int(d), pattern))
Zinverse = list(map(lambda d:1 if d == '.' else -int(d), pattern))
Xdata = [x for y in range(1,10) for x in range(1,10)]
Ydata = [y for y in range(1,10) for x in range(1,10)]
# pivot the data into columns
data = [d for d in zip(Xdata,Ydata,Zdata,Zinverse)]
# create the data frame
df = pd.DataFrame(data, columns=['X','Y','Z',"Zi"], index=zip(Xdata,Ydata))
df.head(5)
Edit: This block of data is demo data that would normally come from a query on a
database that may need more cleaning and transforms before plotting. In this case data is already aligned and there are no problems aside having one more column we don't need (Zi).
So the numbers in pattern are transferred into height data in the Z column of df ('Zi' being the inverse image) and with that as the data frame I've struggled to come up with this pivot method which is 3 separate operations. I wonder if that can be better.
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.cm as cm
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
Xs = df.pivot(index='X', columns='Y', values='X').values
Ys = df.pivot(index='X', columns='Y', values='Y').values
Zs = df.pivot(index='X', columns='Y', values='Z').values
ax.plot_surface(Xs,Ys,Zs, cmap=cm.RdYlGn)
plt.show()
Although I have something working I feel there must be a better way than what I'm doing. On a big data set I would imagine doing 3 pivots is an expensive way to plot something. Is there a more efficient way to transform this data ?
I guess you can avoid some steps during the preparation of the data by not using pandas (but only numpy arrays) and by using some convenience fonctions provided by numpy such as linespace and meshgrid.
I rewrote your code to do so, trying to keep the same logic and the same variable names :
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
pattern = ("....1...."
"....1...."
"..11111.."
".1133311."
"111393111"
".1133311."
"..11111.."
"....1...."
"....1....")
# Extract the value according to your logic
Zdata = list(map(lambda d:0 if d == '.' else int(d), pattern))
# Assuming the pattern is always a square
size = int(len(Zdata) ** 0.5)
# Create a mesh grid for plotting the surface
Xdata = np.linspace(1, size, size)
Ydata = np.linspace(1, size, size)
Xs, Ys = np.meshgrid(Xdata, Ydata)
# Convert the Zdata to a numpy array with the appropriate shape
Zs = np.array(Zdata).reshape((size, size))
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# Plot the surface
ax.plot_surface(Xs, Ys, Zs, cmap=cm.RdYlGn)
plt.show()

Use dataframe column names as labels in pylab.plot

I would like to plot the data in a dataframe and have the column headers be the labels. I tried this:
dfm.columns = ['a','b']
plot(dfm.cumsum(), label= dfm.columns.values)
legend(loc='upper left')
But got this:
Instead of both lines being labeled ['a','b'], I'd like the blue line to be a and the green to be b using pylab
I think it's the way you have your data set up in part of the code you're not showing.
Here's an example, I used df.plot() in this case.
import pandas as pd
import random
import matplotlib.pyplot as plt
x = [random.randint(10,20) for r in range(100)]
y = [random.randint(0,10) for r in range(100)]
df = pd.DataFrame([x,y]).T #T for transpose
df.columns=['a','b']
df.plot(kind='line')
plt.legend(loc='upper left')
plt.show()
Edit
pylab version
import pandas as pd
import random
import matplotlib.pylab as plt
x = [random.randint(10,20) for r in range(100)]
y = [random.randint(0,10) for r in range(100)]
df = pd.DataFrame([x,y]).T
plt.plot(df)
plt.legend(['a','b'],loc='upper left')
plt.show()

Normalizing pandas DataFrame rows by their sums

What is the most idiomatic way to normalize each row of a pandas DataFrame? Normalizing the columns is easy, so one (very ugly!) option is:
(df.T / df.T.sum()).T
Pandas broadcasting rules prevent df / df.sum(axis=1) from doing this
To overcome the broadcasting issue, you can use the div method:
df.div(df.sum(axis=1), axis=0)
See pandas User Guide: Matching / broadcasting behavior
I would suggest to use Scikit preprocessing libraries and transpose your dataframe as required:
'''
Created on 05/11/2015
#author: rafaelcastillo
'''
import matplotlib.pyplot as plt
import pandas
import random
import numpy as np
from sklearn import preprocessing
def create_cos(number_graphs,length,amp):
# This function is used to generate cos-kind graphs for testing
# number_graphs: to plot
# length: number of points included in the x axis
# amp: Y domain modifications to draw different shapes
x = np.arange(length)
amp = np.pi*amp
xx = np.linspace(np.pi*0.3*amp, -np.pi*0.3*amp, length)
for i in range(number_graphs):
iterable = (2*np.cos(x) + random.random()*0.1 for x in xx)
y = np.fromiter(iterable, np.float)
if i == 0:
yfinal = y
continue
yfinal = np.vstack((yfinal,y))
return x,yfinal
x,y = create_cos(70,24,3)
data = pandas.DataFrame(y)
x_values = data.columns.values
num_rows = data.shape[0]
fig, ax = plt.subplots()
for i in range(num_rows):
ax.plot(x_values, data.iloc[i])
ax.set_title('Raw data')
plt.show()
std_scale = preprocessing.MinMaxScaler().fit(data.transpose())
df_std = std_scale.transform(data.transpose())
data = pandas.DataFrame(np.transpose(df_std))
fig, ax = plt.subplots()
for i in range(num_rows):
ax.plot(x_values, data.iloc[i])
ax.set_title('Data Normalized')
plt.show()

Categories