Python Pandas Matplotlib : How to Plot Graph without Numerics? - python

I want to plot bar graph or graphs in python using a Pandas dataframe using two columns that don't contain numeric. One column is Operating System, another is computer name, I want to plot a graph between them showing which OS is running over how many Systems, the sample data is like below.
How can I plot bar graph or other graphs for these two colums. When I try the code below:
ax = dfdefault[['Operating System','Computer Name']].plot(kind='bar')
ax.set_xlabel("Hour", fontsize=12)
ax.set_ylabel("V", fontsize=12)
plt.show()
I get this error:
Error:
TypeError: Empty 'DataFrame': no numeric data to plot

You will need to count the occurrence of each operating system first and then plot using a bar graph or pie chart. bar expects numeric data already, which you don't have. Counting will take care of this. Here is an example using a pie chart:
df = pd.DataFrame(
[['asd', 'win'],
['sdf', 'mac'],
['aww', 'win'],
['dd', 'linux']],
columns=['computer', 'os']
)
df['os'].value_counts().plot.pie()
A bar chart would work similarly. Just change pie to bar.

Related

Stacked bar plot in subplots using pandas .plot()

I created a hypothetical DataFrame containing 3 measurements for 20 experiments. Each experiment is associated with a Subject (3 possibilities).
import random
random.seed(42) #set seed
tuples = list(zip(*[list(range(20)),random.choices(['Jean','Marc','Paul'], k = 20)]))#index labels
index=pd.MultiIndex.from_tuples(tuples, names=['num_exp','Subject'])#index
test= pd.DataFrame(np.random.randint(0,100,size=(20, 3)),index=index,columns=['var1','var2','var3']) #DataFrame
test.head() #first lines
head
I succeeded in constructing stacked bar plots with the 3 measurements (each bar is an experiment) for each subject:
test.groupby('Subject').plot(kind='bar', stacked=True,legend=False) #plots
plot1 plot2 plot3
Now, I would like to put each plot (for each subject) in a subplot. If I use the "subplots" argument, it gives me the following :
test.groupby('Subject').plot(kind='bar', stacked=True,legend=False,subplots= True) #plot with subplot
plotsubplot1 plotsubplot2 plotsubplot3
It created a subplot for each measurment because they correspond to columns in my DataFrame.
I don't know how I could do otherwise because I need them as columns to create stacked bars.
So here is my question :
Is it possible to construct this kind of figure with stacked bar plots in subplots (ideally in an elegant way, without iterating) ?
Thanks in advance !
I solved my problem with a simple loop without using anything else than pandas .plot()
Pandas .plot() has an ax parameters for matplotlib axes object.
So, starting from the list of distinct subjects :
subj= list(dict.fromkeys(test.index.get_level_values('Subject')))
I define my subplots :
fig, axs = plt.subplots(1, len(subj))
Then, I have to iterate for each subplot :
for a in range(len(subj)):
test.loc[test.index.get_level_values('Subject') == subj[a]].unstack(level=1).plot(ax= axs[a], kind='bar', stacked=True,legend=False,xlabel='',fontsize=10) #Plot
axs[a].set_title(subj[a],pad=0,fontsize=15) #title
axs[a].tick_params(axis='y', pad=0,size=1) #yticks
And it works well ! :finalresult

Seaborn Catplot not showing text labels in x axis

i am trying to make a bar catplot with a long dataframe that has three columns: property, value and playlist. Basically it was a wide dataframe that i converted to long format using pd.melt(). the problem is that whenever i try to plot it with a bar catplot the categorical data that should be on the x axis that corresponds to the property column, is just showing up as numbers.
Here is an image of how my dataframe looks:
()
and here are my code and how the plot currently looks:
code:
#bar catplot
bar_catplot = sns.catplot(
kind="bar", x="property", y="value", hue="playlist", legend=True, data=long_frame2
)
bar_catplot_figure = bar_catplot.fig
catplot_render = mpld3.fig_to_html(bar_catplot_figure)
and how the plot currently looks:
.
Thank you in advance!

Create a horizontal waterfall chart with python matplotlib

I am trying to create a waterfall chart, which is like a bar chart, except that each bar starts at the end of its neighboring bars, at the end or beginning, so you have the total, and can see how it breaks down.
I am trying to create this chart in python, but there are no direct charts in matplot.lib called waterfall.
I found code for a vertical waterfall, but I could not transform it to horizontal.
How can I transform a barh matplot chart, for example, to a horizontal waterfall?
I want to create a HORIZONTAL waterfall.
For example, I am trying to make each bar in barh chart in matplotlib start at end of other, but I do not think I am approaching the problem the right way, because I have no results so far.
It should look like this:
Code to create the plot:
my_plot = trans.plot(
kind='barh',
stacked=True,
bottom=blank,legend=None,
figsize=(10, 5)
)
How do I separate the bars?
EDIT
I have found this ready to use python package, but it doesn't work with dataframes, so I cannot use it.
import waterfall_chart
from matplotlib import transforms
a = ['sales','returns','credit fees','rebates','late charges','shipping']
b = [10,-30,-7.5,-25,95,-7]
my_plot = waterfall_chart.plot(a, b, rotation_value=30, sorted_value=True, threshold=0.2,
formatting="$ {:,.1f}", net_label="end result", other_label="misc",
Title="chart", x_lab="X", y_lab="money", blue_color="blue",
green_color="#95ff24", red_color="r")
rot = transforms.Affine2D().rotate_deg(90)
my_plot.show()
I also found this tutorial, with code, for a vertical waterfall chart.
https://pbpython.com/waterfall-chart.html.
It works great, but I didn't manage to reproduce the same thing for a horizontal waterfall.

Why is matplotlib .plot(kind='bar') plot so different to .plot()

This may be a very stupid question, but when plotting a Pandas DataFrame using .plot() it is very quick and produces a graph with an appropriate index. As soon as I try to change this to a bar chart, it just seems to lose all formatting and the index goes wild. Why is this the case? And is there an easy way to just plot a bar chart with the same format as the line chart?
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame()
df['Date'] = pd.date_range(start='01/01/2012', end='31/12/2018')
df['Value'] = np.random.randint(low=5, high=100, size=len(df))
df.set_index('Date', inplace=True)
df.plot()
plt.show()
df.plot(kind='bar')
plt.show()
Update:
For comparison, if I take the data and put it into Excel, then create a line plot and a bar ('column') plot it instantly will convert the plot and keep the axis labels as they were for the line plot. If I try to produce many (thousands) of bar charts in Python with years of daily data, this takes a long time. Is there just an equivalent way of doing this Excel transformation in Python?
Pandas bar plots are categorical in nature; i.e. each bar is a separate category and those get their own label. Plotting numeric bar plots (in the same manner a line plots) is not currently possible with pandas.
In contrast matplotlib bar plots are numerical if the input data is numbers or dates. So
plt.bar(df.index, df["Value"])
produces
Note however that due to the fact that there are 2557 data points in your dataframe, distributed over only some hundreds of pixels, not all bars are actually plotted. Inversely spoken, if you want each bar to be shown, it needs to be one pixel wide in the final image. This means with 5% margins on each side your figure needs to be more than 2800 pixels wide, or a vector format.
So rather than showing daily data, maybe it makes sense to aggregate to monthly or quarterly data first.
The default .plot() connects all your data points with straight lines and produces a line plot.
On the other hand, the .plot(kind='bar') plots each data point as a discrete bar. To get a proper formatting on the x-axis, you will have to modify the tick-labels post plotting.

In pandas, how to properly label a Bar Chart with massive number of records?

I have a pandas series with about 200 rows, containing a integer count in each.
I am trying to plot the series on a bar graph, using the following line of code:
plt.figure(figsize=(40, 40)
country_wise_counts.plot(kind='bar', y='Number of Users', x='Country Name', subplots=False, legend = False, fontsize=12)
and I get a plot as follows:
which clearly is not helpful.
So my first question is:
Is it a sane attempt, trying to plot my data this way, when I have 200 separate values that I want to plot on a bar graph?
If yes, how do I do what I want to do?

Categories