Display bars in plot by ascending/descending order, Matplotlib, Python - python

I have a DataFrame with several numeric columns. My task is to display sums of each column in one chart. I decided to use import matplotlib as plt and a next code
df.sum().plot.bar(figsize = (10,5))
plt.xlabel('Category')
plt.ylabel('Sum')
plt.title('Mast popular categories')
plt.show()
I've got this pic
What I want to change here is to replace bars by ascending/descending order (for example from biggest bar to smallest). But I can't succeed. Help me please to solve this!

Sort the data before plot:
df.sum().sort_values(ascending=False).plot.bar(figsize = (10,5))

Related

How to show the X-axis date ticks neatly or in a form of certain interval in a plotly chart

I am making an OHLC graph using plotly. I have stumbled across one issue. The labels in the x-axis is looking really messy . Is there a way to make it more neat. Or can we only show the extreme date values. For example only the first date value and last date value is show. The date range is a dynamic in nature. I am using the below query to make the graph . Thanks for the help.
fig = go.Figure(data=go.Candlestick(x=tickerDf.index.date,
open=tickerDf.Open,
high=tickerDf.High,
low=tickerDf.Low,
close=tickerDf.Close) )
fig.update_xaxes(showticklabels=True ) #Disable xticks
fig.update_layout(width=800,height=600,xaxis=dict(type = "category") ) # hide dates with no values
st.plotly_chart(fig)
Here tickerDf is the dataframe which contains the stock related data.
One way that you can use is changing the nticks. This can be done by calling fig.update_xaxes() and passing nticks as the parameter. For example, here's a plot with the regular amount of ticks, with no change.
and here is what it looks like after specifying the number of ticks:
The code for the second plot:
import plotly.graph_objects as go
import pandas as pd
df = pd.read_csv('./finance-charts-apple.csv')
fig = go.Figure([go.Scatter(x=df['Date'], y=df['AAPL.High'])])
fig.update_xaxes(nticks=5)
fig.show()
the important line, again is:
fig.update_xaxes(nticks=5)

Size legend for plotly express scatterplot in Python

Here is a Plotly Express scatterplot with marker color, size and symbol representing different fields in the data frame. There is a legend for symbol and a colorbar for color, but there is nothing to indicate what marker size represents.
Is it possible to display a "size" legend? In the legend I'm hoping to show some example marker sizes and their respective values.
A similar question was asked for R and I'm hoping for a similar results in Python. I've tried adding markers using fig.add_trace(), and this would work, except I don't know how to make the sizes equal.
import pandas as pd
import plotly.express as px
import random
# create data frame
df = pd.DataFrame({
'X':list(range(1,11,1)),
'Y':list(range(1,11,1)),
'Symbol':['Yes']*5+['No']*5,
'Color':list(range(1,11,1)),
'Size':random.sample(range(10,150), 10)
})
# create scatterplot
fig = px.scatter(df, y='Y', x='X',color='Color',symbol='Symbol',size='Size')
# move legend
fig.update_layout(legend=dict(y=1, x=0.1))
fig.show()
Scatterplot Image:
Thank you
You can not achieve this goal, if you use a metric scale/data like in your range. Plotly will try to always interpret it like metric, even if it seems/is discrete in the output. So your data has to be a factor like in R, as you are showing groups. One possible solution could be to use a list comp. and convert everything to a str. I did it in two steps so you can follow:
import pandas as pd
import plotly.express as px
import random
check = sorted(random.sample(range(10,150), 10))
check = [str(num) for num in check]
# create data frame
df = pd.DataFrame({
'X':list(range(1,11,1)),
'Y':list(range(1,11,1)),
'Symbol':['Yes']*5+['No']*5,
'Color':check,
'Size':list(range(1,11,1))
})
# create scatterplot
fig = px.scatter(df, y='Y', x='X',color='Color',symbol='Symbol',size='Size')
# move legend
fig.update_layout(legend=dict(y=1, x=0.1))
fig.show()
That gives:
Keep in mind, that you also get the symbol label, as you now have TWO groups!
Maybe you want to sort the values in the list before converting to string!
Like in this picture (added it to the code above)
UPDATE
Hey There,
yes, but as far as I know, only in matplotlib, and it is a little bit hacky, as you simulate scatter plots. I can only show you a modified example from matplotlib, but maybe it helps you so you can fiddle it out by yourself:
from numpy.random import randn
z = randn(10)
red_dot, = plt.plot(z, "ro", markersize=5)
red_dot_other, = plt.plot(z*2, "ro", markersize=20)
plt.legend([red_dot, red_dot_other], ["Yes", "No"], markerscale=0.5)
That gives:
As you can see you are working with two different plots, to be exact one plot for each size legend. In the legend these plots are merged together. Legendsize is further steered through markerscale and it is linked to markersize of each plot. And because we have two plots with TWO different markersizes, we can create a plot with different markersizes in the legend. markerscale is normally a value between 0 and 1 but you can also do 150% thus 1.5.
You can achieve this through fiddling around with the legend handler in matplotlib see here:
https://matplotlib.org/stable/tutorials/intermediate/legend_guide.html

How to improve this seaborn countplot?

I used the following code to generate the countplot in python using seaborn:
sns.countplot( x='Genres', data=gn_s)
But I got the following output:
I can't see the items on x-axis clearly as they are overlapping. How can I correct that?
Also I would like all the items to be arranged in a decreasing order of count. How can I achieve that?
You can use choose the x-axis to be vertical, as an example:
g = sns.countplot( x='Genres', data=gn_s)
g.set_xticklabels(g.get_xticklabels(),rotation=90)
Or, you can also do:
plt.xticks(rotation=90)
Bring in matplotlib to set up an axis ahead of time, so that you can modify the axis tick labels by rotating them 90 degrees and/or changing font size. To arrange your samples in order, you need to modify the source. I assume you're starting with a pandas dataframe, so something like:
data = data.sort_values(by='Genres', ascending=False)
labels = # list of labels in the correct order, probably your data.index
fig, ax1 = plt.subplots(1,1)
sns.countplot( x='Genres', data=gn_s, ax=ax1)
ax1.set_xticklabels(labels, rotation=90)
would probably help.
edit Taking andrewnagyeb's suggestion from the comments to order the plot:
sns.countplot( x='Genres', data=gn_s, order = gn_s['Genres'].value_counts().index)

Sorted bar charts with pandas/matplotlib or seaborn

I have a dataset of 5000 products with 50 features. One of the column is 'colors' and there are more than 100 colors in the column. I'm trying to plot a bar chart to show only the top 10 colors and how many products there are in each color.
top_colors = df.colors.value_counts()
top_colors[:10].plot(kind='barh')
plt.xlabel('No. of Products');
Using Seaborn:
sns.factorplot("colors", data=df , palette="PuBu_d");
1) Is there a better way to do this?
2) How can i replicate this with Seaborn?
3) How do i plot such that the highest count is at the top (i.e black at the very top of the bar chart)
An easy trick might be to invert the y axis of your plot, rather than futzing with the data:
s = pd.Series(np.random.choice(list(string.uppercase), 1000))
counts = s.value_counts()
ax = counts.iloc[:10].plot(kind="barh")
ax.invert_yaxis()
Seaborn barplot doesn't currently support horizontally oriented bars, but if you want to control the order the bars appear in you can pass a list of values to the x_order param. But I think it's easier to use the pandas plotting methods here, anyway.
If you want to use pandas then you can first sort:
top_colors[:10].sort(ascending=0).plot(kind='barh')
Seaborn already styles your pandas plots, but you can also use:
sns.barplot(top_colors.index, top_colors.values)

Plotting multiple timeseries power data using matplotlib and pandas

I have a csv file of power levels at several stations (4 in this case, though "HUT4" is not in this short excerpt):
2014-06-21T20:03:21,HUT3,74
2014-06-21T21:03:16,HUT1,70
2014-06-21T21:04:31,HUT3,73
2014-06-21T21:04:33,HUT2,30
2014-06-21T22:03:50,HUT3,64
2014-06-21T23:03:29,HUT1,60
(etc . .)
The times are not synchronised across stations. The power level is (in this case) integer percent. Some machines report in volts (~13.0), which would be an additional issue when plotting.
The data is easy to read into a dataframe, to index the dataframe, to put into a dictionary. But I can't get the right syntax to make a meaningful plot. Either all stations on a single plot sharing a timeline that's big enough for all stations, or as separate plots, maybe a subplot for each station. If I do:
import pandas as pd
df = pd.read_csv('Power_Log.csv',names=['DT','Station','Power'])
df2=df.groupby(['Station']) # set 'Station' as the data index
d = dict(iter(df2)) # make a dictionary including each station's data
for stn in d.keys():
d[stn].plot(x='DT',y='Power')
plt.legend(loc='lower right')
plt.savefig('Station_Power.png')
I do get a plot but the X axis is not right for each station.
I have not figured out yet how to do four independent subplots, which would free me from making a wide-enough timescale.
I would greatly appreciate comments on getting a single plot right and/or getting good looking subplots. The subplots do not need to have synchronised X axes.
I'd rather plot the typical way, smth like:
import matplotlib.pyplot as plt
plt.plot([1,2,3,4], [1,4,9,16], 'ro')
plt.axis([0, 6, 0, 20])
plt.savefig()
( http://matplotlib.org/users/pyplot_tutorial.html )
Re more subplots: simply call plt.plot() multiple times, once for each data series.
P.S. you can set xticks this way: Changing the "tick frequency" on x or y axis in matplotlib?
Sorry for the comment above where I needed to add code. Still learning . .
From the 5th code line:
import matplotlib.dates as mdates
for stn in d.keys():
plt.figure()
d[stn].interpolate().plot(x='DT',y='Power',title=stn,rot=45)
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%D/%M/%Y'))
plt.savefig('Station_Power_'+stn+'.png')
Does more or less what I want to do except the DateFormatter line does not work. I would like to shorten my datetime data to show just date. If it places ticks at midnight that would be brilliant but not strictly necessary.
The key to getting a continuous plot is to use the interpolate() method in the plot.
With this data having different x scales from station to station a plot of all stations on the same graph does not work. HUT4 in my data has far fewer records and only plots to about 25% of the scale even though the datetime values cover more or less the same range as the other HUTs.

Categories