python plotly plot value to thousands - python

I am planning to use plotly and plot value from a number to .k, for example, the value showed on the chart is like 8247294, and I want it to show like 8.25M
I tried something like this:
x = [x for x in range(1,len(table))] #date
y = table['revenue'].values.tolist()
fig = go.Figure(go.Scatter(x=x, y=y,text=y,mode="lines+markers+text",
line=dict(color='firebrick', width=4)))
fig.update_layout(width=900,height=650)
fig.update_layout(
tickformat='k')
It is not working.So what's the correct way of doing it?

Related

Plotly: How to add annotations to different intervals of the y-axis?

I'm trying to add annoation to y axis based on different inverval of y value
if y > 0, I want to give the annotation of Flexion
if y < 0, I want to give the annotation of Extension
I tried to use multicategory to specify the annotation
my code is show below
import plotly.graph_objects as go
import numpy as np
x = np.arange(-10,10,1)
y = np.arange(-10,10,1)
y_annotation = [ 'Flexion' if data > 0 else 'Extension' for data in y ]
fig = go.Figure( data= go.Scatter(x=x,y=[y_annotation,y]) )
fig.show()
This will produce
but I don't want the lines to seperate the Flexision and Extension
and this method will give detailed y values on the y axis, which is also I don't want to have
I'm wondering if there's another way to add annotation to y axis based on different interval?
Thanks !
If you're happy with the setup above besides the lines and detailed y-axis, then you can drop the multi index approach and just set up annotations at the appropriate positions using fig.add_annotation()
The following figure is produced with the snippet below that:
makes room for your annotations on the left side using fig.update_layout(margin=dict(l=150)),
stores interval names and data in a dict, and
calculates the middle values of each specified interval, and
places the annotations to the left of the y-axis using xref="paper", and
does not mess up the values of the y-axis tickmarks.
Plot
Complete code:
import plotly.graph_objects as go
import numpy as np
x = np.arange(-10,10,1)
y = np.arange(-10,10,1)
y_annotation = [ 'Flexion' if data > 0 else 'Extension' for data in y ]
intervals = {'Flexion':[0,10],
'Extension':[0, -10]}
# plotly setup
fig = go.Figure( data= go.Scatter(x=x,y=y) )
# make room for annotations
fig.update_layout(margin=dict(l=150))
for k in intervals.keys():
fig.add_annotation(dict(font=dict(color="green",size=14),
#x=x_loc,
x=-0.16,
y=(intervals[k][0]+intervals[k][1])/2,
showarrow=False,
text="<i>"+k+"</i>",
textangle=0,
xref="paper",
yref="y"
))
fig.show()

stacked barplot with total and edited axis limit - python

I'm trying to do a stacked barplot, but it seems to be pretty tricky with seaborn. I have this data:
x = pd.DataFrame({"Groups" : np.random.choice(["Group1", "Group2", "Group3"], 100),
"Sex" : np.random.choice(["Masculine", "Femenine"], 100)})
x = x.groupby(["Groups", "Sex"]).size().reset_index(name="count")
x["percent (%)"] = round(x.groupby("Groups").transform(lambda x: x/sum(x))*100,1)
x
And I have this plot:
sns.barplot(x="Groups", y="percent (%)", hue="Sex", data=x);
However, I'm looking that each group has a stacked bar, the y-axis from 0 to 1, and a "group4" with a total. When I try to plot the limits like here it gives me an error as this seaborn graph doesn't allow it, and every stacked barplot from seaborn I have found have a column per each group with the values of each group in his respective column and I have all the groups in one column. Any ideas?
I'm looking for a simple solution (with or without seaborn) without changuing the structure of the data (except for adding the "total group", but I don't know if it's easier to add the total to the data, or computing the total inside the graph).
Not sure what group4 would look like, here's a stacked bar graph:
x = pd.DataFrame({"Groups" : np.random.choice(["Group1", "Group2", "Group3"], 100),
"Sex" : np.random.choice(["Masculine", "Femenine"], 100)})
xf = x.groupby(["Groups"])['Sex'].value_counts().unstack('Groups')
xf['Total'] = xf.sum(1)
xf.div(xf.sum()).T.plot.bar(stacked=True)
Output:

Matplotlib both axis values overlapping

Just started using Matplotlib, I have imported csv file using URL, In this file there are almost 190+ entries for countries along with specific regions in which this country belongs to like India in Asia. I am able to plot all data but due to these much data all X Axis and Y Axis values overlap each other and getting messy.
Code:
country_cols = ['Country', 'Region']
country_data = pd.read_csv("https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv",names=country_cols)
country_list = country_data.Country.tolist()
region_list = country_data.Region.tolist()
plt.plot(region_list,country_list)
And output shows like this
For sake of learning, I am using a simple line chart, I also want to know which graph type should be used for representing such data? It would be so much helpful.
I think you need fig.autofmt_xdate()
Try this code:
country_cols = ['Country', 'Region']
country_data = pd.read_csv("https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv",names=country_cols)
country_list = country_data.Country.tolist()
region_list = country_data.Region.tolist()
fig = plt.figure()
plt.plot(region_list,country_list)
fig.autofmt_xdate()
plt.show()

How to plot pandas grouped values using pygal?

I have a csv like this:
name,version,color
AA,"version 1",yellow
BB,"version 2",black
CC,"version 3",yellow
DD,"version 1",black
AA,"version 1",green
BB,"version 2",green
FF,"version 3",green
GG,"version 3",red
BB,"version 3",yellow
BB,"version 2",red
BB,"version 1",black
I would like to draw a bar chart, which shows versions on x axis and an amount (number) of different colors on y axis.
So I want to group DataFrame by version, check which colors belong to a particular version, count colors and display the results on the pygal bar chart.
It should look similar to this:
What I tried so far:
df = pd.read_csv(results)
new_df = df.groupby('version')['color'].value_counts()
bar_chart = pygal.Bar(width=1000, height=600,
legend_at_bottom=True, human_readable=True,
title='versions vs colors',
x_title='Version',
y_title='Number')
versions = []
for index, row in new_df.iteritems():
versions.append(index[0])
bar_chart.add(index[1], row)
bar_chart.x_labels = map(str, versions)
bar_chart.render_to_file('bar-chart.svg')
Unfortunately, it does not work and can not match group of colors to proper version.
I also tried using matplotlib.pyplot and it works like a charm:
pd.crosstab(df['version'],df['color']).plot.bar(ax=ax)
plt.draw()
This works as well:
df.groupby(['version','color']).size().unstack(fill_value=0).plot.bar()
But the generated chart is not accurate enough for me. I would like to have pygal chart.
I also checked:
How to plot pandas groupby values in a graph?
How to plot a pandas dataframe?

How to change x axis increments and plot using log(x) on the xaxis?

I would like to shorten my xaxis so my data will be more visible. However, I don't know how to accomplish this while leaving my xaxis as log(x).
Here is my code for the above image:
data = Data([
Bar(
y=[x/float(114767406) for x in yp_views],
x=[x for x in yp_views],
name='Relative Frequency')])
layout = Layout(xaxis=XAxis(type='log',title = "Number of Premium Highlight Views")
,yaxis=YAxis(title = "Frequency"))
fig = Figure(data = data, layout = layout)
py.iplot(fig)
Here is what I tried:
I tried solving this problem by using histogram and the xbins. However, this doesn't allow me the freedom of using a custom x and y axis to plot. I don't see a xbins property for bar charts. Is there another name for it?
Here is trying to plot using the range:
data = Data([
Bar(
y=[x/float(114767406) for x in yp_views],
x=[x for x in yp_views],
name='Relative Frequency')])
layout = Layout(xaxis=XAxis(type='log', range = [3000,10000], title = "Number of Premium Highlight Views")
,yaxis=YAxis(title = "Frequency"))
fig = Figure(data = data, layout = layout)
py.iplot(fig)
You can use range=[min, max] in your xaxis / yaxix to define your desired range. For example, your layout would look like something like this:
layout = Layout(xaxis=XAxis(type='log', range=[np.log10(3000), np.log10(10000)],
title = "Number of Premium Highlight Views"),
yaxis=YAxis(title = "Frequency"))

Categories