Plotting hierarchical axis labels with multilevel index using pandas [duplicate]

Plotting hierarchical axis labels with multilevel index using pandas [duplicate] - python

This question already has answers here:
Adding multi level X axis to matplotlib/Seaborn (month and year)
(1 answer)
Plot two levels of x_ticklabels on a pandas multi-index dataframe [duplicate]
(1 answer)
How to add group labels for bar charts
(2 answers)
Closed 14 days ago.
I'm trying to make a stacked bar chart where the x-axis is a combination of years and quarters.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({'year': [2019, 2019, 2019, 2019, 2020, 2020, 2020, 2020],
'quarter': [1, 2, 3, 4, 1, 2, 3, 4],
'value1': range(1, 9),
'value2': np.ones(8)})
df.set_index(['year', 'quarter'], inplace=True)
df.plot.bar(stacked=True)
plt.tight_layout()
plt.show()
Currently, it looks like this:
However, I would like the axis labels to be hierarchical, such that 2019 and 2020 only show once on the bottom, and above it it shows the quarter.
Is there an easy way to do this other than using a separate subplot for each year?

Related

set x axis as column names on barplot

I have a dataframe such as this:
data = {'name': ['Bob', 'Chuck', 'Daren', 'Elisa'],
'100m': [19, 14, 12, 11],
'200m': [36, 25, 24, 24],
'400m': [67, 64, 58, 57],
'800m': [117, 120, 123, 121]}
df = pd.DataFrame(data)
name 100m 200m 400m 800m
1 Bob 19 36 67 117
2 Chuck 14 25 64 120
3 Daren 12 24 58 123
4 Elisa 11 24 57 121
My task is simple: Plot the times (along the y-axis), with the name of the event (100m, 200m, etc. along the x-axis). The hue of each bar should be determined by the 'name' column, and look something like this.
Furthermore, I would like to overlay the results (not stack). However, there is no functionality in seaborn nor matplotlib to do this.

Instead of using seaborn, which is an API for matplotlib, plot df directly with pandas.DataFrame.plot. matplotlib is the default plotting backend for pandas.
Tested in python 3.11, pandas 1.5.1, matplotlib 3.6.2, seaborn 0.12.1
ax = df.set_index('name').T.plot.bar(alpha=.7, rot=0, stacked=True)
seaborn.barplot does not have an option for stacked bars, however, this can be implemented with seaborn.histplot, as shown in Stacked Bar Chart with Centered Labels.
df must be converted from a wide format to a long format with df.melt
# melt the dataframe
dfm = df.melt(id_vars='name')
# plot
ax = sns.histplot(data=dfm, x='variable', weights='value', hue='name', discrete=True, multiple='stack')

Histogram for a dataframe column [duplicate]

This question already has answers here:
Selecting a column to make histogram
(1 answer)
How to plot a histogram of a single dataframe column and exclude 0s
(1 answer)
How do I only plot histogram for only certain columns of a data-frame in pandas
(1 answer)
Closed 7 months ago.
I would like to construct a histogram (or empirical distribution function) for a dataframe column (=a column contatining a number of daily observations).
The dataframe column has the following structure (below)
Thanks in advance!
df1 = pd.DataFrame({"date": pd.to_datetime(["2021-3-22", "2021-4-7", "2021-4-18", "2021-5-12","2022-3-22", "2022-4-7", "2022-4-18", "2022-5-12"]),
"x": [1, 1, 1, 3, 2, 3,4,2 ]})
date x
0 2021-03-22 1
1 2021-04-07 1
2 2021-04-18 1
3 2021-05-12 3
4 2022-03-22 2
5 2022-04-07 3
6 2022-04-18 4
7 2022-05-12 2

Pandas has plotting feature with matplotlib backend as default, so you can do it like this:
df1.x.hist()
More: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html

You can do this with pyplot:
from matplotlib import pyplot as plt
plt.hist(df1.x)
#if you just want to look at the plot
plt.show()
#if you want to save the plot to a file
plt.savefig('filename.png')
Here's the documentation with all the options: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html.

generate series of plots with pandas dataframe

I have to generate a series of scatter plots (roughly 100 in total).
I have created an example to illustrate the problem.
First do an import.
import pandas as pd
Create a pandas dataframe.
# Create dataframe
data = {'name': ['Jason', 'Jason', 'Tina', 'Tina', 'Tina', 'Jason', 'Tina'],
'report_value': [4, 24, 31, 2, 3, 5, 10],
'coverage_id': ['m1', 'm2', 'm3', 'm4', 'm5', 'm6', 'm7']}
df = pd.DataFrame(data)
print(df)
Output:
coverage_id name report_value
0 m1 Jason 4
1 m2 Jason 24
2 m3 Tina 31
3 m4 Tina 2
4 m5 Tina 3
5 m6 Jason 5
6 m7 Tina 10
The goal is generate two scatter plots without using a for-loop. The name of the person, Jason or Tina, should be displayed in the title. The report_value should be on the y-axis in both plots and the coverage_id (which is a string) on the x-axis.
I thought I should start with:
df.groupby('name')
Then I need to apply the operation to every group.
This way I have the dataframe grouped by their names. I don't know how to proceed and get Python to make the two plots for me.
Thanks a lot for any help.

I think you can use this solution, but first is necessary convert string column to numeric, plot and last set xlabels:
import matplotlib.pyplot as plt
u, i = np.unique(df.coverage_id, return_inverse=True)
df.coverage_id = i
groups = df.groupby('name')
# Plot
fig, ax = plt.subplots()
ax.margins(0.05) # Optional, just adds 5% padding to the autoscaling
for name, group in groups:
ax.plot(group.coverage_id,
group.report_value,
marker='o',
linestyle='',
ms=12,
label=name)
ax.set(xticks=range(len(i)), xticklabels=u)
ax.legend()
plt.show()
Another seaborn solution with seaborn.pairplot:
import seaborn as sns
u, i = np.unique(df.coverage_id, return_inverse=True)
df.coverage_id = i
g=sns.pairplot(x_vars=["coverage_id"], y_vars=["report_value"], data=df, hue="name", size=5)
g.set(xticklabels=u, xlim=(0, None))

How to plot multiple pandas columns

I have dataframe total_year, which contains three columns (year, action, comedy).
How can I plot two columns (action and comedy) on y-axis?
My code plots only one:
total_year[-15:].plot(x='year', y='action', figsize=(10,5), grid=True)

Several column names may be provided to the y argument of the pandas plotting function. Those should be specified in a list, as follows.
df.plot(x="year", y=["action", "comedy"])
Complete example:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({"year": [1914,1915,1916,1919,1920],
"action" : [2.6,3.4,3.25,2.8,1.75],
"comedy" : [2.5,2.9,3.0,3.3,3.4] })
df.plot(x="year", y=["action", "comedy"])
plt.show()

Pandas.DataFrame.plot() per default uses index for plotting X axis, all other numeric columns will be used as Y values.
So setting year column as index will do the trick:
total_year.set_index('year').plot(figsize=(10,5), grid=True)

When using pandas.DataFrame.plot, it's only necessary to specify a column to the x parameter.
The caveat is, the rest of the columns with numeric values will be used for y.
The following code contains extra columns to demonstrate. Note, 'date' is left as a string. However, if 'date' is converted to a datetime dtype, the plot API will also plot the 'date' column on the y-axis.
If the dataframe includes many columns, some of which should not be plotted, then specify the y parameter as shown in this answer, but if the dataframe contains only columns to be plotted, then specify only the x parameter.
In cases where the index is to be used as the x-axis, then it is not necessary to specify x=.
import pandas as pd
# test data
data = {'year': [1914, 1915, 1916, 1919, 1920],
'action': [2.67, 3.43, 3.26, 2.82, 1.75],
'comedy': [2.53, 2.93, 3.02, 3.37, 3.45],
'test1': ['a', 'b', 'c', 'd', 'e'],
'date': ['1914-01-01', '1915-01-01', '1916-01-01', '1919-01-01', '1920-01-01']}
# create the dataframe
df = pd.DataFrame(data)
# display(df)
year action comedy test1 date
0 1914 2.67 2.53 a 1914-01-01
1 1915 3.43 2.93 b 1915-01-01
2 1916 3.26 3.02 c 1916-01-01
3 1919 2.82 3.37 d 1919-01-01
4 1920 1.75 3.45 e 1920-01-01
# plot the dataframe
df.plot(x='year', figsize=(10, 5), grid=True)

xticks values as dataframe column values in matplotlib plot [duplicate]

This question already has answers here:
Using datetime as ticks in Matplotlib
(3 answers)
Closed 5 years ago.
I have data.frame below
values years
0 24578.0 2007-09
1 37491.0 2008-09
2 42905.0 2009-09
3 65225.0 2010-09
4 108249.0 2011-09
5 156508.0 2012-09
6 170910.0 2013-09
7 182795.0 2014-09
8 233715.0 2015-09
9 215639.0 2016-09
10 215639.0 TTM
The plotted image is attached, the issue is i want years values '2007-09' to 'TTM' as xtick values in plot

One way to do this would be to access the current idices of the xticks in the x data. Use that value to select the values from df.year and then set the labels to those values:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
df.plot(ax=ax)
tick_idx = plt.xticks()[0]
year_labels = df.years[tick_idx].values
ax.xaxis.set_ticklabels(year_labels)
You could also set the x axis to display all years like so:
fig, ax = plt.subplots()
df.plot(ax=ax, xticks=df.index, rot=45)
ax.set_xticklabels(df.years)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Plotting hierarchical axis labels with multilevel index using pandas [duplicate] - python

Related

set x axis as column names on barplot

Histogram for a dataframe column [duplicate]

generate series of plots with pandas dataframe

How to plot multiple pandas columns

xticks values as dataframe column values in matplotlib plot [duplicate]

Categories

Resources