Pandas groupby group visualization by dividing between groups - python

I am facing a very annoying problem. I have a dataset where I have the sales amounts for different regions and years.
I would like to visualize the yearly aggregated sales amounts based on different regions.
Below is my groupby code:
groups = df.groupby(["Region", "Year"])["Sales"].sum()
groups.plot.bar(color="blue")
plt.show()
And the output I get looks like this:
I have two questions:
1. How could I somehow separate the region and year bars from each other as this way my chart looks really confusing? A separator line or actually a highlighter would also work, or even a bigger gap would be a good solution to me.
(Please see below, what I mean:)
or
I have no clue at all, how to solve this problem.
Question no 2. How could I have this image sorted by the regions with most sales, followed by the second most sales yearly, and so on? Kind of sorting in a descending order based on regions.
I tried the code below:
groups = df.groupby(["Region", "Year"])["Sales"].sum()
groups2=groups.sort_values(axis=[0][1],ascending=False)
groups.plot.bar(color="blue")
plt.show()
But I get a list index out of range error. Using axis=[0] does not solve the problem.
Thank you very much for your help in advance!

Following ChrisD's advice you can obtain a working result with seaborn's catplot to display your bars into different facets by region.
sns.catplot(x='Year', y='Sales', col='Region', data=groups, kind='bar')
You may have to format the aspect ratios for your display purposes.

Related

Identifying Plot Name or Visualization Implementation

I'm working on a dataset of SMS records [datetime_entry, sms_sent] and I was looking to copy a really effective trend visual from a well cited Electricity demand study. Does anyone know the name of this plot, or the implementation of something similar in Python (as I'm not sure this was done in Python).
I know how to subplot the 4 charts after splitting the data by quarter, I'm just stumped on the plot type and stylization.
This is what matplotlib calls an eventplot.
Essentially each vertical line represents an occurance of a Mwh demand during that specific hour. So each row in the plot should have as many vertical lines as there are days in that quarter.
While it works in this plot for these data, relying on the combination of alpha level + data density can be slightly unreliable as the data change as the number of overlapping points is not readily visible. So you can also create a similar visualization using hist2d, where you manually specify your bins.

Is there a simple way to plot multiple series on one pandas scatter plot?

I come across this issue constantly; and my current solution is to create additional dataframes, I feel like there must be an easier solution.
Here is an example of data where I have multiple countries with multiple attributes:
If I wanted to plot Population vs. Depression (%) I would write:
ax = df.plot.scatter(x='Population', y='Depression (%)')
This isn't super helpful, as there are clearly lines linked to specific Countries (df['Country']). Is there a simple way to plot a scatter plot with different series (colors/shapes/etc) as different Countries?
Right now I use groupby to separate out individual Countries and plot them on the same axes (ax = ax).
Any thoughts or input would be greatly appreciated! Thank you!
Try c="Country" and then if you want some nice colors you can go colormap='viridis' for example documentation
ax2 = df.plot.scatter(x='length',
y='width',
c='species',
colormap='viridis')
Since you are using strings as variables we can't use this approach directly and need to convert the data to numbers. This can be done by writing:
c=df.country.astype("category").cat.codes

Frequency vs total count bar plot in python

I have a datafram with following structure
,mphA,gyrA,parC,tet59,qnrVC
sample1,TRUE,FALSE,FALSE,FALSE,FALSE
sample2,TRUE,FALSE,FALSE,FALSE,TRUE
sample3,FALSE,FALSE,FALSE,TRUE,FALSE
sample4,FALSE,FALSE,FALSE,TRUE,TRUE
sample5,TRUE,FALSE,TRUE,FALSE,TRUE
sample6,TRUE,TRUE,FALSE,FALSE,FALSE
sample7,TRUE,TRUE,TRUE,FALSE,TRUE
sample8,TRUE,TRUE,TRUE,TRUE,TRUE
sample9,FALSE,TRUE,TRUE,FALSE,TRUE
sample10,TRUE,TRUE,FALSE,FALSE,TRUE
And I need to generate a frequency vs total count bar plot similar to the following figure in python. Its a combination of 3 plots so I guess you need to plot them independently and put them in a single canvas. I frequently see this plot in journals so I guess it should be implemented already. However, I did not have any success with online search. Does anybody know how it can be done? Thanks.
It can be done easily using UpSetPlot
https://pypi.org/project/UpSetPlot/

Wiskerplots are not clear enough to analyze data

I'm trying to analyze a set of costs using python.
The columns in the data frame are,
'TotalCharges', 'TotalPayments', 'TotalDirectVariableCost', 'TotalDirectFixedCost', 'TotalIndirectVariableCost', 'TotalIndirectFixedCost.
When I tried to plot them using the whisker plots, this is how they could display
I need to properly analyze these data and understand their behavior.
The following are my questions.
Is there any way that I can use wisker plots more clearly?
I believe since these are costs, we cannot ignore them as outliars. So keeping the data as it is what else I can use to represent data more clearly?
Thanks
There are a couple of things you could do:
larger print area
rotate the axis
plot one axis log scale
That said, I think you should examine once again your understanding of what a box and whisker plot is for.
Additionally, you might consider posting this on the Math or Cross Validated site as this doesn't have much to do with code.

How to plot a dataframe that contains values spread over a large spectrum of values?

I have the following dataframe, resulted from running grid search over several regression models:
As it can be noticed, there are many values grouped around 0.0009, but several that are a few orders of magnitude higher (-1.6, -2.3 etc).
I would like to plot these results, but I don't seem to find a way to get a readable plot. I have tried a bar plot, but I get something like:
How can I make this bar plot more readable? Or what other kind of plot would be more suitable to visualize such data?
Edit: Here is the dataframe, exported as CSV:
,a,b,c,d
LinearRegression,0.000858399508896,-4.11609208874e+20,0.000952538859738,0.000952538859733
RandomForestRegressor,-1.62264355718,-2.30218457629,0.0008957696846039999,0.0008990722465239999
ElasticNet,0.000883257900658,0.0008525502791760002,0.000884706195921,0.000929498696126
Lasso,7.92193516085e-05,-1.84086765436e-05,7.92193516085e-05,-1.84086765436e-05
ExtraTreesRegressor,-6.320170496909999,-6.30420308033,,
Ridge,0.0008584791396339999,0.0008601028734780001,,
SGDRegressor,-4.62522968756,,,
You could make the graph have a log scale, which is often used for plotting data with a very large range. This muddies the interpretation slightly, as now each equivalent distance is an equivalent order of magnitude difference. You can read about log scales here:
https://en.wikipedia.org/wiki/Logarithmic_scale

Categories